UNIVERSALITY OF COVARIANCE MATRICES 
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In this paper we prove the universahty of covariance matrices of the form Hnxn ~ 
X^X where [X]mxn is a rectangular matrix with independent real valued entries [xij] 
satisfying Exy — and Ea;|j = j^, N,M — >■ oo. Furthermore it is assumed that 
these entries have sub-exponential tails. We will study the asymptotics in the regime 
N/M = (In € (0, oo), limTv-i-oo dN 7^ 0, l,oo. Our main result states that the Stieltjes 
transform of the empirical eigenvalue distribution of H is given by the Marcenko-Pastur 
law uniformly up to the edges of the spectrum with an error of order {Nt])~^ where t] 
is the imaginary part of the spectral parameter in the Stieltjes transform. From this 
strong local Marcenko-Pastur law, we derive the following results. 1. The rigidity of 
eigenvalues: If 7j = 'Jj.n denotes the classical location of the j-th eigenvalue under the 
Marcenko Pastur law ordered in increasing order, then the j-th eigenvalue Xj of H is 
close to 7j in the sense that for some positive constants C, c such that, 

3 J : \Xj-jj \ ^ {logNf^°s^°s^\^ram{mm{N,M)^j,j)^ iV-^/aj 5; Cexp [ - (logiV)'='°siogJV 

for N large enough. 2. The delocalization of the eigenvectors of the matrix XX'' uni- 
formly both at the edge and the bulk. 3. Bulk universality, i.e., the n-point correlation 
functions of the eigenvalues of the sample covariance matrix X^X coincide with those 
of the Wishart ensemble, when N goes to infinity. 4. Universality of the eigenvalues of 
the sample covariance matrix X''X at both edges of the spectrum. Furthermore the first 
two results are applicable even in the case in which the entries of the column vectors 
of X are not independent but satisfy a certain large deviation principle. All our results 
hold for both real and complex valued entries. 



1. Introduction. In this paper we prove tlie universality of covariance matrices. Let X = 
(xij) be an M X data matrix with independent centered real valued entries with variance 

Xij = M-^/\,j, Eg,j=0, EgJ = l. (LI) 

Furthermore, the entries qij have a sub-exponential decay, i.e., there exists a constant > 
such that for M > 1, 

F{\qij\ >u) ^ exp{-u^) . (L2) 
AMS 2000 .subject classifications: 15B52, 82B44 
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The covariance matrix corresponding to data matrix X is given hy H = X'^X . We will be 
working the regime 

d = dN = N/M, lim 7^ 0,l,cx) . 

Thus without loss of generality, henceforth we will assume that for some small constant 6, 
for all iV e N, 

6 < dN < 0'^ and 6 < \dN - l\ ■ 

All our constants may depend on 6 and but we will not denote this dependence. In this 
paper we focus on the case where the matrix X has real valued entries which is a natural 
assumption for applications in statistics, economics etc. However all of the results in this 
paper also hold for complex valued entries with the moment condition (1.1) replaced with 
its complex valued analogue: 

Xi,=M-^/\,j, Eqij = 0, EgJ = 0, E |g,,f = 1 . (1.3) 

Furthermore, in some of the results in the present work, the independence assumption of 
the matrix entries is weakened, see Theorem 1.7, which can be used to show universality of 
other matrix ensembles such as correlation matrices, see [1] and [27]. 

Covariance matrices are fundamental objects in modern multivariate statistics where the 
advance of technology has lead to high dimensional data. They have manifold applications in 
various applied fields; see [4, 18, 19, 20] for an extensive account on statistical applications, 
[17, 24] for applications in economics and [25] in population genetics to name a few. In 
the regime we study in this paper where A^, M are proportional to each other, the exact 
asymptotic distribution of the eigenvalues is not known, except for some cases under specific 
assumptions on the distributions of the entries of the covariance matrix, e.g., the entries are 
Gaussian. In this context, akin to the central limit theorem, the phenomenon of universality 
helps us to obtain the asymptotic distribution of the eigenvalues, without having restrictive 
assumptions on the distribution on the entries. Borrowing a physical analogy, as observed 
by Wigner, the eigenvalue gap distribution for a large complicated system is universal in the 
sense that it depends only on the symmetry class of the physical system but not on other 
detailed structures. 

A fundamental example is the well studied Wishart matrix (the covariance matrix obtained 
from a data matrix X consisting of i.i.d centered Gaussian random variables) for which one 
has closed form expressions for many objects of interest including the joint distribution of the 
eigenvalues. In this paper we prove the universality of covariance matrices (both at the bulk 
and at the edges) under the assumption that the entries of the corresponding data matrix 
are independent, have mean 0, variance 1 and have a sub-exponential tail decay. This implies 
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that, asymptotically the distribution of the local statistics of eigenvalues of the covariance 
matrices of the above kind are identical to those of the Wishart matrix. 

Over the past two decades, great progress have been made in proving the universality 
properties of i.i.d. matrix elements {Standard Wigner ensembles). The most general results 
till date for the universality of Wigner ensembles are obtained in Theorem 7.3 and 7.4 of 
[7], in which bulk (edge) universality is proved for Wigner matrices under the assumption 
that entries have a uniformly bounded 4 + e (12 + e) moment for some e > 0, and then 
recently improved further by [11] and [23]. The key ideas for the universality of Wigner 
ensembles were developed through several important steps in [8, 9, 12, 13, 14]. The ideas 
we use in this paper are also adapted from the above cited papers. There are also related 
results by [30, 31]. However results regarding universality for covariance matrices have been 
obtained only recently [2, 10, 16, 26, 28, 15, 29, 32]^. Moreover these results are obtained 
under strong assumptions; for example in the "four moment theorem" of [29], universality 
results are proved under the assumption that the first four moments of the elements of 
the data matrix are equal to those of the standard Gaussian. In [10] the authors prove bulk 
unversality of covariance matrices under the assumption that distribution of matrix elements 
have a smooth density. These results, although quite interesting, exclude many important 
cases including the Bernoulli ensembles. On the other hand, our results do not require the 
smoothness of the distribution of the matrix entries and only need the first two moments of 
the entries of the data matrix to be identical to those of the standard Gaussian. Furthermore, 
some of our results are applicable even in situations where the entries in same column are 
not independent, but satisfy a certain large deviation bound as explained below. Indeed, 
using the new idea on the Green function comparison for matrices with nonindependent 
entries developed in this paper, we prove the edge universality of correlation matrices in a 
companion paper [27]. This new method has also been used in [1] recently. Notice that in 
this work, we do require an exponential tail decay condition for the matrix entries. However 
we believe that all of our results can be proved with the exponential tail condition replaced 
by a uniform bound on p^^ moment of the matrix elements (say p = 5 or 7), using the 
methods in [7] and [23] and we will pursue this elsewhere. 

The approach we take in this paper to prove universality is the one developed in a recent 
series of papers [6, 7, 8, 9, 10, 12, 13, 14]. Before stating the main results precisely, let us 
give a broad overview of our work. The first step is to derive a strong local Marcenko-Pastur 
law, a precise estimate of the local eigenvalue density in the optimal scale A^~^+°(^) , which 
is our key technical tool for proving rigidity and universality. En route to this, we also 
obtain precise bounds on the matrix elements of the corresponding Green function. One key 
new input is a simple, general Theorem (Theorem 3.3) on the cancellation of the weakly 

^ Some of the results in [2, 26, 28, 29] pertain to the case limdN =0,1 or oo(e.g. bulk universality in 
some cases where limdAr = 1 [29]), which will not be discussed in this paper. 
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coupled random variables. This lemma is analogous to Lemma 4.1 in [7], but our form is 
more general, and besides covariance matrices, it has been used in the context of correlation 
matrices and Non-Hermitian matrices [3, 27]. For proving bulk universality of eigenvalues, 
the next step is to embed the covariance matrix into a stochastic flow of matrices and so that 
the eigenvalues evolve according to a distinguished coupled system of stochastic differential 
equations, called the Dyson Brownian motion [5]. An important idea in the papers mentioned 
above is to estimate the time to local equilibrium for the Dyson Brownian motion with the 
introduction of a new stochastic flow, the local relaxation flow, which locally behaves like a 
Dyson Brownian motion but has a faster decay to global equilibrium. This approach, first 
introduced in [8, 10], entirely eliminates the usage of explicit formulas. We will also follow 
this route and use the strong local Marcenko-Pastur law to show that the time for the 
Dyson Brownian motion (corresponding to the covariance matrix) to reach local equilibrium 
is about 0(A^"^). Once we prove this result, all that remains to be done is to show that 
the local statistics at t = 0{N~^) coincide with those of initial matrix, i.e., t = 0. To this 
end, we use a strategy called the 'Green function comparison method'. Roughly speaking, 
the Green function comparison method exploits the fact that the equilibrium time is very 
"small" {0{N~^)) and therefore the first few moments of a matrix entries at time t = 
will be nearly identical to those at t = (see Sec 5 and 6 for more details). This method 
was first used in [13] to prove the bulk universality of the generalized Wigner matrix and 
subsequently used in a number of papers [14, 6, 7, 21, 22]. Our method for proving bulk 
universality essentially follows the same method; however as mentioned above, we introduce 
a completely novel method for proving edge universality as will be explained below. 
Define the Green function of X'^X by 



The Stieltjes transform of the empirical eigenvalue distribution of X'^X is given by 






(1.4) 




(1.5) 



We will be working the regime 



d : 



dN:=N/M, lim d 7^ 0,1,00 . 



(1.6) 



The case limTv^oo 7^ 0, 1, oo will be studied in our future work, since their limiting spectrums 
have a different structure, but we believe that many methods in the present paper can be 
applied to those cases. Especially, for the case lim^Tv = 1, the distribution of the largest 
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eigenvalues still satisfies the Tracy Widom law, and it can be proved as in this paper with 
very slight revision. 
Define 



(^i±vQy. (1.7) 



The Mar chenko- Past ur (henceforth abbreviated by MP) law is given by: 



1 I \{X+ - x){x - X^) 



27rd V a;2 
We define mc{z), 2; G C, as the Stieltjes transform of Qc, i.e., 



mc{z)= I j^^.dx. (1.9) 

[X — z] 



The function rric depends on d and has the closed form solution 

1 — d — z + i^y {z — A_)(A+ — z) 



2dz 



where ^ denotes the square root on complex plane whose branch cut is the negative real 
line. One can check that mc{z) is the unique solution of the equation 

^^^^"f + n — ^ TT = 

z — [1 — d) + z d mc{z) 

with '^mc{z) > when '^z > 0. Define the normalized empirical counting function by 

n(i?) := ^#{A, ^ E}. (1.11) 

Let 

PCX) 

n,{E):= / g,{x)dx (1.12) 



so that 1 — nc(-) is the distribution function of the MP law. 

By the singular value decomposition of X, there exist orthonormal bases {ui, U2 . . . , um} G 
C^^ and {vi, . . . , vtv} G such that 



M N 



a=l a=l 
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where Ai ^ A2 . . . Amax{A/,7V} ^ 0, Aq, = for min{A^, M} + 1 ^ a ^ max{A^, M} and we 
let Vq, = 0, if a > and u^, = 0, for a > M. We also define the classical location of the 
eigenvalues with as follows 

rX+ r+oo 

/ Qc{x) dx = I Qc{x) dx = j / N . (1-14) 

Define the parameter 

^ := (logiV)'°s'°s^. (1.15) 

For C ^ 0, define the set 

(1.16) S(C) := {2 e C : ld>i(A_/5) ^ E ^ 5A+ , ^^N^^ ^ 77 ^ 10(1 + d)] . 

Note that ttLc ~ 1 in S(0). 

Remark 1.1. As one can see that the cases d > 1 and d < 1 are not symmetric in the 
above definition. Actually the proof for the universality in the case d > 1 is much harder, 
since X'^ X has many zero eigenvalues. This issue can he easily avoid if matrix entries are 
independent since X'^X and XX'^ have the same non-zero eigenvalues. Since we do not 
require the independence assumption, our proof is harder and longer. 

Definition 1.2 (High probability events). Let C > 0. We say that an event f2 holds with 
C-high probability if there exists a constant C > such that 

(1.17) ¥{VL^) ^ e-xY>{-^^) 
for large enough N . 

Our goal is to estimate the following quantities 

Arf := max — '??^c|; Aq := max IG^^I, A:=|m — md, (1-18) 

k ky^i 

where the subscripts refer to "diagonal" and "off-diagonal" matrix elements. All these quan- 
tities depend on the spectral parameter z and on A^ but for simphcity we suppress this in 
the notation. 

The following is the main result of this paper (note that they hold for both real and 
complex valued entries as mentioned in the introduction): 
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Theorem 1.3 (Strong local Marchenko-Pastur law). Let X = [xij] with the entries 
satisfying (1.1) and (1.2). For any C > there exists a constant such that the following 
events hold with (-high probability. 

(i) The Stieltjes transform of the empirical eigenvalue distribution of H satisfies 
(a) The individual matrix elements of the Green function satisfy that 



(Hi) The smallest non zero and largest eigenvalues of X'^X satisfy 

A_ - ^ mill A,- ^ max A,- ^ A+ + A^-^^V^^ • (1-21) 

j^min{J\/,Af} j 

(iv) Delocalization of the eigenvectors ofX^X: 

max llv^lloo ^ ^^«iV~^/^ . (1.22) 

Remark 1.4. A weak version of (1.19) was obtained in [10]. However the error term 
obtained in (1.19) is of order {NriY^/'^/{K + {NrjY^/^f/'^ (k = min{|E - A+|, \E - A_|}j, 
whereas we need the above stronger result for our results, especially for the edge universality. 

The main theorem above is then used to obtain the following results: 

Theorem 1.5 (Rigidity of the eigenvalues of covariance matrix). Recall^j in (1.14). Let 
X = [xij] with the entries Xij satisfying (1.1) and (1.2). For any 1 ^ j ^ N , let 

j = min I min{iV, M} + 1 — j, j|. 

For any C > there exists a constant such that 

|A^.-^.| ^<^C:c^~2/3j-l/3 (^^23) 

and 

\n{E) - n,{E)\ ^ ^^^N^^ (1.24) 
hold with C-high probability for any 1 ^ j ^ N . 



The above two results are stated under the assumption that the matrix entries are indepen- 
dent. The independence assumption (of the elements in each column vector of X) required 
in Theorems 1.3 and 1.5 can be replaced with the following large deviation criteria. 

Let us first recall the following large deviation lemma for independent random variables 
(see [12], Appendix B for a proof). 

Lemma 1.6. (Large Deviation Lemma) Suppose Oi be independent, mean complex vari- 
ables, with E|ajp = 0"^ and have a sub- exponential decay as in (1.2). Then there exists a 
constant p = p('i9) > 1 such that, for any C > and for any G C and Bij G C, the bounds 

M 

J2ci^A<:{\ogMy'^a\\A\\ (1.25) 

i=l 

M AI AI 

I J]a,5,,a, -Y,^'B,,\ ^ {logMY^a' (^ m')'/^ (1.26) 

1=1 1=1 i=l 

hold with (-high probability. 

Next we extend Theorems 1.3 and 1.5 by relaxing the independence assumption. 

Theorem 1.7. Let X = (xij) be a random matrix with ^{xjj) = 1/M. Assume that the 
column vectors of the matrix X are mutually independent. Furthermore, suppose that for 
any fixed j ^ A^, the random variables defined by ai = Xij,l ^ i ^ M satisfy the large 
deviation bounds (1.25), (1.26) and (1.27), for any G C and Bij G C and C > 0. Then 
the conclusions of Theorem 1.3 and 1.5 hold for the random matrix X . 

Thus Theorem 1.7 extends the universality results to a large class of matrix ensembles. 
For instance, let hij be a sequence of i.i.d random variables and set 



Xij = ' , 1 ^i ^ M,l ^ J ^ N . (1.28) 



^;=i hi 

Thus the entries of the column vector {xij,X2j, ■ ■ ■ ,XMj) are not independent, but ex- 
changable. Clearly lE(x^^) = j^. The random variables Xij given by (1.28) are called self nor- 
malized sums and arise in various statistical applications. For instance, the matrix X = [xij] 
constructed above is called the correlation matrix (see [18, 27]) and is often preferred in 
applications such as the principal component analysis (PCA) due to the scale invariance of 
the correlation matrix. 
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Proof of Theorem 1.7. In the proofs of Theorems 1.3 and 1.5, we only use the large 
deviation properties of Oj = Xij and the fact that lE(a;^j) = 1/M, instead of independence and 
sub-expotential decay. Therefore the proofs of Theorems 1.3 and 1.5 in fact yield Theorem 
(1.7). □ 

Theorem 1.8 (Universality of eigenvalues in Bulk). Let X"^ = [xjj] with the independent 
entries satisfying (1.1) and (1.2), and let be defined similarly. Let E G [A_ + r, A+ — r] 
with some r > 0. Then for any e > 0, A^^^"*"^ < 6 < r/2, any fixed integer n ^ 1 and for any 
compactly supported continuous test function O : M" — M we have 

i'?jrfI«'---"'(^t"^-a)(-'-^.-----'-^)n^^o 

(1.29) 

where p^^^^ and p^^jv '^'^^ n-points correlation functions of the eigenvalues of (X^YX^ 
and (X'^yX'^, respectively. 

Remark 1.9. The above result follows from Theorem 2.1 in [10] with some strong inputs 
from our Theorem 1.3 and 1.5. 

Theorem 1.10 (Universality of extreme eigenvalues). Let X^ = [xjj] with independent 
entries satisfying (1.1) and (1.2), as so X^. Then there is an e > and 6 > such that for 
any real number s (which may depend on N) we have 

{N^/'\\i -\+)<:s- N~') - <: P^(iV2/3(Ai - A+) ^ s) ^ F^'iN^/'^Xi - \+) <: s + N-') + N- 

(1.30) 

for N ^ No sufficiently large, where Nq is independent of s. An analogous result holds for 
the smallest eigenvalue Amin{A/,Af}- 

The above result on edge universality is one of the key results of our work. This result 
uses the Green function comparison theorem as in the case of Wigner matrices, however 
unlike previous methods, where one compares the Green function of two matrices whose 
distribution are identical except for a single entry, we compare the Green function of two 
matrices whose entries are distributed identically except for one row. This new idea is not only 
just efficient, but also makes our approach useful for the case where the matrix entries are not 
independent such as the correlation matrix. In particular, in Theorem 6.5 (see Section 6) we 
give a sufficient criteria for proving edge universality for matrix ensembles of the form Y'^Y 
for a generic data matrix Y with dependent entries {e.g., correlation matrices). Our method 
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is potentially useful for establishing universality for a huge class of matrix ensembles; in a 
companion paper [27] we use this method to prove edge universality of correlation matrices 
(also see [1] for another application of the method). 

We also note that Theorem 1.10 can be extended to obtain universality of finite correlation 
functions of extreme eigenvalues. For example, we have the following extension to (1.30): 

P^(iV2/3(Ai - A+) ^ - iV-^ . . . , N^/'\\k - A+) ^ Sk+i - iV-^) - (1.31) 
^ P-(iV2/3(Ai - A+) ^ si, . . . , N^/\Xk - K) ^ sfc+i) 
^ P^(iV2/3(Ai - A+) ^ si + iV-^ . . . , N^/^Xk - X+) ^ Sk+i + iV^^) + 

for all k fixed and N sufficiently large. The proof of (1.31) is similar to that of (1.30) and 
we will not provide details except stating the general form of the Green function comparison 
theorem (Theorem 6.4) needed in this case. We remark that edge universality is usually 
formulated in terms of joint distributions of edge eigenvalues in the form (1.31) with fixed 
parameters Si,S2,... etc. Our result holds uniformly in these parameters, i.e., they may 
depend on A^. However, the interesting regime is \sj\ ^ ^p^^^\ otherwise the rigidity estimate 
(1.23) gives a stronger control than (1.31). 

In [26], [28] and [15], Peche, Soshnikov, Feldheim and Sodin proved that for covariance 
matrices whose entries have a probability density function symmetric around (thus includ- 
ing the Wishart matrix), the largest and smallest k eigenvalues after appropriate centering 
and rescaling converge in distribution to the Tracy- Widom law^. We have the following 
immediate corollary for Theorem 1.10: 

Corollary 1.11. Let X with independent entries satisfying (1.1) and (1.2). For any 

fixed k > 0, we have 

( MXi-{VN+VMf MXk-{VN + ^f \ 

V(v^ + v^)(^ + ^)V3' ■ ■ ■ ' (v/iv + yM)(^ + -^yi^j ^ ' ' 

where TWi denotes the Tracy- Widom distribution. An analogous statement holds for the 
smallest eigenvalues. 

The rest of the paper is organized as follows. In Sections 2-4 we establish the strong version 
of the Marcenko-Pastur law, rigidity and delocalization of eigenvectors. In Sections 5-6, we 
respectively prove the bulk and edge universality results. We conclude this section by giving 
some introducing some key definitions and notations used throughout the paper. 



^Here we use the term Tracy -Widom law as in [28] 
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1.1. Notations and key definitions. Define 

H:=X^X, G{z) := {H - z)-^ = {X^X - z)-\ m (z) := ^Ti G{z) (1.32) 

g{z) := (xxt - zy' . 

Since the non-zero eigenvalues of XX'^ and X'^X are identical and XX'^ has M — N more 
(or — M less) zero eigenvalues, 

M - N 

Tr G{z) - Tig{z) = . (1.33) 

We will often need to consider minors of X defined below: 

Definition 1.12 (Minors). ForT c {l,...,iV} we define X^'^^ as the {M x {N - |T|)) 
minor of X obtained by removing all columns of X indexed by i eT. Note that we keep the 
names of indices of X when defining X^'^\ 

(XW),, := i(j^T)X,,. 

The quantities G^'^\z), Q'^'^\z), X^a \ Ua'\ vi^-'etc. are defined similarly using Xm. Fur- 
thermore, we write abbreviate (i) = ({?}) as well as (iT) = ({i} U T). We also set 

-^'H^) := ^E^i'H-)- (1-34) 

We denote the i^^ column of X by x^, which is a M x 1 vector. Recall A+, A_ from (1.7). For 

z = E + irj, set 

K := mm{\\+ - E\,\E - X^\). (1.35) 

Throughout the paper we will use the letters C, C^, c to denote generic constants whose 
precise value may change from one occurance to the next but independent of everything 
else. Finally, for two quantities a, b we write a ~ 6 to denote cb ^ a ^ Gb. 

2. Apriori bound for the strong local Marcenko-Pastur law. Our goal in this section is 
to prove the following weaker form of Theorem 1.3, and in Section 4 we will use this apriori 
bound to obtain the stronger form as claimed in Theorem 1.3. 

Theorem 2.1. Let X = [xij] with the entries Xij satisfying (1.1) and (1.2). For any 
C > there exists a constant G^ such that the following event holds with (-high probability 



zeSiCc) 
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2.1. A Roadmap for the reader. For conveying the key ideas of the computations involved 
in this section, we first give a brief outhne of the proof of Theorem 2.1. For the reader's 
convenience, we also indicate the corresponding theorems/lemmas in which the estimates 
mentioned below are proved. 

The proof of Theorem 2.1 proceeds via "self consistent equations" explained below. Let 
us fix > 0. By definition it follows that 

where 

Z,:=^(x„^«x,)-^Tr^». (2.2) 

We will first establish Theorem 2.1 for Q'z = 77 ~ 1. For ?7 ~ 1, the empirical Stieltjes trans- 
form satisfies 

1 V 1 

m(z) = — > , max I "K- 1 < 09*"^ \1/ 

i 

with C-high probability (see Lemma 2.11) where 



Remark 2.2. Notice that when mc + A ^ 0(1), we have 

^ ^ 0(iVr/)-i/2 _ ^2.4) 

Consequently, we deduce that for 77 ~ 1 , the function m{z) satisfies the following "self- 
consistent" equation 

m{z) = / , . . + 0(¥.^cvl>) (2.5) 

1 — z — d — zdm[z) 

with C-high probability. Notice that the above equation satisfied by m{z) is nearly identical 
to the fixed point equation satisfied by the Stieltjes transform of the MP-law, namely 

mdz) + , . . = 0, (2.6) 

z — [1 — d) -\- z d mf.[z) 
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with '^rric > when '^z > 0. From (2.5) and (2.6), we immediately deduce that (Lemma 
2.11) for ?7 ~ 1, with (^-high probabihty, 

|m-m,|=A(z)<^^c_l_. (2.7) 

We now use (2.7) to estabhsh Theorem 2.1 (Equation (2.1)), for 77 ~ 1. To this end, we 
identify the following "bad sets" (improbable events). For z G S(0), define 

Q(z,K) := I max |Ao(2),max |Gii(^) - m{z)\,max\Zi\'^ ^ A'^f(z)| . (2.8) 

Then the event (Lemma 2.10) 

n ^i^^^'^'T (2.9) 

26S(0),r?~l 

holds with C-high probability. The estimate (2.9) coupled with (2.7) immediately establishes 
Theorem 2.1 for 77 ~ 1. 

Before proceeding, we notice the following important point. When 77 is not assumed to be 
~ 1, a statement analogous to (2.9) holds with a different assumption. Set 

B{z) := [Ao{z)+Ad{z) > (logiV)-^} , (2.10) 

T{z,K) ■=n{z,KyuB{z) . (2.11) 
In Lemma 2.9 we show that 

n r(^,^^0 (2.12) 

holds with C"high probability. It can also be shown that for ?7 ~ 1, the event B'^ holds with 
(^-high probability. 

For proving the result for all z G S(C() {i.e., for all rj ^ (p''N~^) we proceed as follows. 
For a function u{z), define its "deviance" to be 

V{u){z) := {u-\z) + zdu{z)) - {m-\z) + zdm^{z)) . (2.13) 

Clearly, V{mc) = 0. The plan is to show that |D(m)| ~ and therefore |mc — m\ ^ 0. 

More precisely, suppose that for two numbers L,K satisfying ip^ ^ K'^ilogN)'^ and for 
some A C n_,^Q^^^^T{z, K) fl^^i B'^{z) (here we mean to say if ~ 1, then A C Q{z,Ky, 
i.e., A is not in the bad sets of z such that 2 ~ 1) one has the bound 

\V{m){z)\^6{z) + 00 1^^^^ WzeS{L) (2.14) 
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where 6 : IR+ is a continuous function, decreasing in Q^z and \S{z)\ ^ (logA^)"^. Then, 
via a continuity argument we show in Lemma 2.13 that from (2.14) one indeed has the 
following stronger conclusion: 



A ^ C(logiV) 



yz E S(L) 



(2.15) 



and A C D^^^f^^^B^iz) (with A C n^^G,^^^T{z, K) n^^i B'^iz), it imphes that A C n{z,KY, 
i.e., A is not in bad sets of all z.). This estimate with a brief additional argument will yield 
that for large enough C and z G S{ip'"), we have A = o(l) and Q{z, ip'-^'-Y holds with C-high 
probability. These two conclusions immediately yield Theorem 2.1. 

2.2. Preliminary Lemmas. We start with the following elementary lemma whose proof is 
standard: 

Lemma 2.3. For any rectangular matrix M, and partition matrices, A,B and D of M 
A B 

' , we have the following identity 



given by M = 



fit D 



-D-^B^U-^ D-^ + D-^B^U-^BD-^ 



U = A- BD-^B^ 



Lemma 2.4. For any z not in the spectrum of X'^X, we have 

X{X^X - zY^X^ = 1 + z{XX^ - z)-^ . 
Proof. Indeed from the SVD decomposition given in (1.13) we have 

An. + 



X{X^X - z)-^X^ = J2 



a 



Xa — Z' 



I + z{XX 



and the lemma is proved. 



□ 



The next lemma collects the main identities of the resolvent matrix elements and GiJ\z). 



Lemma 2.5 (Resolvent identities). 

Giiiz) -- ^ 



-z — z 



(x„^W(^)x,)' 



i.e., (xi,6;«(z)x,) 



z Gii(z) 



- 1 



(2.16) 
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G,,{z) = zG,,{z)Gf^{z) (x„6;fe-)(z)x,), 1^3 (2.17) 
G,(.)= + (2.18) 

Proof. The proof only needs elementary linear algebra and follows from Lemma 2.3 and 
Lemma 2.4 above and Lemma 3.2 of [14]. □ 

We record the following properties of ttlc without proof. 

Lemma 2.6 (Properties of nic). For z G S(0) we have the following hounds: 

|m,(2)|~l, |1 -m2(z)| ~ 0~F^. (2.19) 

if K^T] and\E\^[X_,X+\ 
'^mc{z)^{ (2.20) 



y/ K + 7] if hi ^ T] or \E\ E [\-,\- 



Furthermore 



' ^ O — and dr, ^ ^ . 2.21 

Recall B{z) from (2.10). 

Lemma 2.7 (Rough bounds of Ai^^ and A^^^). Fix T C {1, 2, ■ ■ ■ , A^} such that \T\ = 
0(1). For z G S(0), there exists a constant C = C\f\ such that the following estimates hold 
in If: 

max|G£)-G,,KCA2, (2.22) 
^^\G^S\^C, (2.23) 



C 



K^P ^ CKo . (2.24) 



Proof. For T = 0, (2.22) and (2.24) follow from definition, (2.23) follows from the 
definition of B{z) and (2.19). For nonempty T, one can prove the lemma using an induction 
on |T|. For example, for |T| = 1, using (2.18) we can show that 



G,,{z)-GZ\z)\^CAI, (2.25) 



which implies the bound (2.22). A similar argument will yield (2.23) and (2.24). 



□ 
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On the other hand, when r/ ~ 1, a bound similar to (2.23) holds without the assumption 

of 

Lemma 2.8 (Rough bounds for Gkk for ?7 ~ 1). Fix T C {1, 2, ■ ■ ■ , A^} such that |T| = 
0(1). For any z G S(0) and ~ 1, we have the bound 

max|GSP(z)| ^C, 

for some C > and 1 ^ i ^ N . 

Proof. Let us show the result first for |T| = 0. By definition 



X„ — z 



^ - Vu,(z)u„(«) ^ - ^ C 



where in the second inequality we have used \Xa — z\ ^ z = rj. The claim for a general T 
follows similarly. □ 

Recall from (2.8) and (2.11), the event 

r(z,(^^c) = n{z,^^^yuB{z) . 

Define the events: 

(]o(^,K):=|Ao^i^*(^)}, (2.26) 
n^iz^K) := \^max\Gu{z)-m{z)\ ^ K^{z)^ , 
9.z{z,K) := |max|Zi| ^ K^{z)^ . 

Note: ^^(z, is defined in terms of m not Set 

fi(^, K) = no{z, K) U Vtd{z, K) U nz{z, K) . 
Lemma 2.9. For any C > there exists a constant Cc_ such that 

n r(^,¥^^0 (2.27) 

holds with (-high probability. 
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Proof. We only need to prove that there exists a uniform constant such that for any 
z G S(C^) the event 

r(z,<^^c) (2.28) 

holds with C-high probability. It is clear that (2.27) follows from (2.28) and the fact that 

\d^Gij\^N^ , r]>N-^. (2.29) 

Note r(^, K) = {n"^ U B) n (fi;^ U B) n (l^^ U B). First we shall prove that the n^UB holds 
with ^-high probability. Using Lemma 1.6, Equation (2.17) and the fact that |Gp = G*G, 
we infer that there exists a constant such that with C-high probability, 

A.(z) ^ C\z\ max | (x, g^-h,)\ ^ ^^^^ ^ ^ ^"^^^ '^'^'l')''' 



^^^c|,|^__^, in B^{z) (2.30) 

where in the last step we used the identity ?7~^ 

^Y^giij) = Using the identity 

TrGm(^) - Tr^m(^) = (2.31) 
Equation (2.22) and 53(z~^) = ?7|2;|~^ we deduce that with ^-high probability 



^m, + A + Al , 1 



For the above choice of C^, for z G S(3C^), with '^rric ^ 0(1), the bound 



c /Smc + A , 1 



Ao(^)^^^S/ +^ + o(A,) in B^(^) (2.32) 

holds with C-high probability. From (2.32) and (2.21) it follows that ri^UB holds with C-high 
probability. 

A similar argument using Lemma 1.6 will give 



(x„^«x,)-l:Tr^« 



^ in r(^)(2.33) 
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holds with C-high probabihty implying that 



max \Zi\ ^ if^'^^ 

i 

and therefore fi^ U B holds with C-high probability. 

Finally notice that maxj \Gii — m| ^ maxj^j \Gii — Gjj\. From (2.16) we obtain that 

IG,,-G„I<I— ' 



^-z - z (xj, ^^(z) Xj) -z- z (xj, (5^i\z) Xj) 
|G..G,,|(|Z.-Z,| + M|Tr^«-Tr^(^-)|) 



^ + A^ + iV-i) in B\z) 

holds with C-high probability, where the last inequality follows from (2.33), (1.33), (2.22) 
and (2.23). Thus we have shown that fi^ U B holds with C"high probability and the lemma 
is proved. □ 

On the other hand, in the case of 77 ~ 1, a result similar to Lemma 2.9 holds without the 
assumption of B'^- 

Lemma 2.10. For any C > 0, there exists a constant such that the event 

n ^{^^^""'T (2.34) 
2e5'(o),?7~i 

holds with (-high probability. 

Proof. From (2.29) we see that we only need to prove (2.34) for fixed z. First we note 
in this case, i.e., 77 ~ 1 , we have Qrric ~ 1 and from Lemma 2.8 we have A = 0(1) and 
therefore 

^ ~ _ (^2.35) 

As in (2.30) and Lemma 2.8 we obtain that 



with C-high probabihty. The estimate for Zj can be proved as in (2.33) using Lemma 2.8. 
The estimate for Qd (see (2.26)) can also be proved similarly using the identity 

Tr^W - Tr ^(^'^ = TrG^^^ - Tr G^^^ = 0(r/)^^ 
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which follows the Cauchy's interlacing theorem of eigenvalues, i.e., 

\m-m^'^\ ^ {Nt])-^ (2.36) 
and the proof is finished. □ 

2.3. Self consistent equations. In Section 2.2, we have bounded Aq and maXj(Gjj — m) in 
term of rric, rj and A in (we do not need the event B'^ when 77 ~ 1). In this subsection, we 
will give the desired bound for A and show that the event B'^ holds with (^-high probability. 

First we give the bound for A in the case of 77 ~ 1. 

Lemma 2.11. For any C > 0, there exists a constant such that 

Pi A{z) < ip^'^^N-^/^ (2.37) 

26S'(0),»;=10(l+d) 

holds with (-high probability. 

Proof. By the definition of Zi (in Equation (2.2)) and (2.16), 

{G,,iz)r' = -z- ^^Tr - Z, . (2.38) 

Using (2.31) and (2.36), we obtain that if 77 ~ 1, 

1 



z — Tr^('' - zdm{z) + l-d 



^ CN~^. (2.39) 

IVl 

Together with |Zj| ^ ip^^'^ (see (2.34)), the estimate (2.39) implies that 



m[z] 



It thus follows that |?77,(z)| ~ 1 for 77 ~ 1 with C"high probability. Then using the fact that 
— ''Ti) = we obtain that, 

V {Gu{z)Y^ = m-\z) + 0(max - m\f . 

' i 

i 

Recall V in (2.13). Using (2.38), (2.35) and the bound \Zi\ + \Gii - m| ^ (^^c^ (see (2.34)), 
and, we have 

V{m) = 6{z), \6{z)\ ^ ^ 0{ip^'N^^/^) . 
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The two solutions mi,m2 of the equation V{m) = 6{z) for a given 6{-) are given by 

6{z) + l-d-z± i^{^z - A_,5)(A+,5 - z) 

1^1,2 = 7^ 2.40 

2a z 

X^s = l + d± 2^/d-6{z) - 6{z) , |A±,5 - A±| = 0{6) . 

Therefore, we obtain m = mi or m2. It is easy to see that |mi— ^ 0(1), since 77 ~ 1. Since 
mlz) is continuous w. r. t. E (for fixed T]),'m = mi (say) for E = imphes that m = mi for all 

E = 0(1). Using this fact and 53m > 0, we obtain that m{z) = '^(^^"'"-^ ^~^'^^!iz ^^"^'^^^^'^ — . 
Comparing with (1.10), we obtain (2.37). □ 

Now combining (2.34) with (2.37), we have proved that for any ( > 0, there exists a 
constant O^ such that, for 77 = 10(1 + d), Equation (2.1) holds with ^-high probability. It 
immediately follows that the event 

n e'(^) (2-41) 

^eS(0),r?=10(l+d) 

holds with C-high probability for any C > 0. 

Now we prove (2.1) for general 77 > 0. Recall the deviance function from (2.13), Zi from 
(2.2) and set 

1 ^ 

[^] = ^ E (2-42) 

i=l 

Recall the set B{z) from (2.10) and T(z,K) from Lemma 2.9. 

Lemma 2.12. Fix 1 ^ K ^ {log N)'^ {Nijy/'^ . Then, on the set T{z,K), we have the 
bound 

|P(m)K|[Z]| + 0(i^^vl/2) + ool^(^). 

Proof. Using (2.16), (2.22), (2.31) and the definition of m^, on the set T{z, K), we obtain 
a more precise version of (2.38) (we denote Q{z, K) by Q) 

Gu{z)-^ = m^{z)-^ + zd[m^{z) -m{z)]- Zi + 0{K^^^) + 0{N-^) in B'^nn". 

Then 

G^^ - m^^ = V{m) -Z, + 0{K^ ^2) + 0{N~^) in n Q"" (2.43) 
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and averaging over i yields 



N 



i=l 



It follows from the assumptions K <^ (NrjY'''^ ^ 0(\1/ ^) that Ga —m = o(l). Expanding the 
left hand side and using the facts that — m) = 0, 



N N ^ ^ N 



j=l j=l «=1 j=l 

Together with (2.23) and (2.8), it follows that 

1 ^ 

-J2{G^'-m-')^C{K^Y in n (2.44) 

i=l 

Now the lemma follows from (2.43) and (2.44). □ 

Lemma 2.13. Let K,L > be two numbers such that ip^ ^ K'^iXogNY and let A be an 
event given by 

Ac Pi V{z,K)n Pi 5"(z) . (2.45) 

ze5{L) ^e5(L),r?=10(l+d) 

Suppose that, in A, we have the bound 

\V{m) {z)\^5{z) + ool , \/z e S{L) 

where 5 : C i— )■ M+ is a continuous function, decreasing in z and \S{z)\ ^ (logA^)~®. Then 
for some constant C > 0, the bound 

\m{z) - mj^z) \=k{z)<, C(log A^) ^ ^^^^ , ^z e S{L) . (2.46) 

y/K + T] + 6 

holds in A and 

Ac [] B'iz) . (2.47) 

z&S{L) 

Remark 2.14. Equation (2.45) says that if^z = 10(1 + d), then A C ^l{z,KY, i.e., A 
is not in the bad sets of such z; (2.47) implies that A is not in the bad sets of all z G S{L). 
The difficulty in the proof of Lemma 2.13 stems from the fact that our hypothesis yields the 
bound V{m) ^ 5{z) only in the set but we need to prove (2.46) for both B and . 
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Proof. Let us first fix E and define the set 



Ie={v- K{E + zr)) + Xd{E + zr)) ^ Wfj ^ r] , E + if] e S(L)} . 

log A* 

We first prove (2.46) for all z = E + it] witli rj G Ie- Define 



sup < 7] : 6{E + ir]) ^ {\ogN)-\K + r])\ ■ 



Since 5 is a continuous decreasing function of rj by assumption, 5{E + irj) ^ (log A^)~^(K + r7i) 
for rj ^ rji . Let mi and m2 be the two solutions of the equation V{m) = 6{z) as given in 
(2.40). Note by assumption we do have [/^(m)! ^ 6{z) for z = E + rji and rj G Ie, since we 
are in B'^(z). Then it can be easily verified that 



|mi — 7712! ^ Cy/n + 7], V ^ Vi (2.48) 

^C(logiV)v^, V^Vi- 

The difficulty here is that we don't know which of the two solutions mi,m2 is equal to m. 
However for rj = 0(1), we claim that m = mi. For 77 = 0(1), |m — m^ = A ^ A^^ -C 1. Also 
a direct calculation using (2.40) gives 

|mi-m,|=0^i^«-^. (2.49) 

y^K + T] \ogN 

Since |mi — m2| ^ C^/h^^flj for rj = 0(l)(see (2.48)), it immediately follows that m = mi for 
rj = 0(1). Furthermore since the functions mi,m2 and m are continuous and since mi 7^ m2 
for Tj > Tji, it follows that m = mi for 77 ^ 771. Thus 7/ ^ 7/1, 

\m{z) - m,{z)\ = \mi{z) - m,{z)\ <: C^kL= ^ C- ^^""^ 



where in the last step we have used S ^ k + t]. 

For rj ^ 771, we take advantage of the fact that the difference |mi — m2| is the same order 
as the middle term of (2.49). Indeed, for 7/ ^ 7/1, if m = m2 (say), then using (2.48) 

|m — md ^ |m2 — mil + |mi — md ^ {log N)\/S{z) ^ C (log A^) = 

y/K + T] + 6 

verifying (2.46) for rj G Ie- 

From the above computations for 7/ ~ 1, we know Ie 7^ 0. Now we prove that Ie is 
exactly the desired region i.e., [(p^N~^, 10{1 + d)], and this will verify (2.47). We argue 
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by contradiction. Indeed, assume that Ie + [(^■^iV"\ 10(1 + d)\. Let r/o = inf /e- Then the 
continuity assumption yields that 

Ao(2;o) + k^{zQ) = (logiV)^^ , zo = E + ir]o (2.50) 

and thus A.(zq) ^ A.(i{zq) ^ (logA^)^^. On the other hand, from the calculations done above 
we deduce that (2.46) holds for r] E Ie and thus 

A{zo)^{logNy\ (2.51) 

By definition 

{Ao{zo) + Ad{zo) = (logiv)-!} nr(zo) c {no{zo)und{zo)r , 



and therefore 



Ao{zo) + max \Gkk{zo) - m{zo)\ ^ CK'^{zo) . 

k 



From the assumption cp^ ^ K\\ogN)\ we have ^{zq) ^ + ^ < ^-^(logA^)-^ 

which immediately implies that Ao{zo) + max^ |G'fcfc(^o) ^ "^(^o)| ^ (log^)~^- Using this 
estimate and (2.51) we deduce that 

Ao(2o) + Ad(2o) ^ Ao(2o) + max \Gkk{zo) - m{zo)\ + A < log A^"^ 

k 

which contradicts (2.50) and therefore (2.47) is verified. This concludes the proof of the 
lemma. □ 

Now we complete the proof of Theorem 2.1. 

Proof of Theorem 2.1. From (2.33), Lemma 2.9 and 2.12, it follows that for any ( > 
0, there exist constants C^, and that 

|P(m)(z)| ^ (^^c^ + cx)1b(,), VzgS(Cc) 
holds on the event given by 

A^= f] r(^,y,^c). (2.52) 

Choosing a larger C^, applying Lemma 2.13 with 

A = A^n Pi B'^iz) 

z&S(0),r)=10{l+d) 
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and S{z) = ip'"'^{Nr])~^^'^, we obtain that 

A{z) <:^^i{Nr])-^/\ V^eS(Cc) (2.53) 
holds in A. Furthermore, (2.47) imphes that 

Ac Pi B%z) . (2.54) 

This observation gives that A{z) ^ Ad{z) = o(l) in A and \1/ ^ C{Nr])~^^^ in A. Now 
since both and flzesco) 7?=io(i+d) ^'^(2) hold with C-high probability (proved respectively in 
Lemma 2.9 and (2.41)) it follows that the event A holds with (^-high probability. Now from 
the observation (2.54) we see that Q{z,ip'-'''^Y holds with (^-high probability. Together with 
\1/ ^ C{Nr])~^^'^ in A, we obtain (2.1). This completes the proof of Theorem 2.1. □ 

3. Strong bound on {Z\. For proving Theorem 1.3 and 1.5, the key input is the following 
lemma which gives a much stronger bound of [Z]. The following is the main result of this 
section: 

Lemma 3.1. Let K,L > 0, such that (p^ ^ i^^(logA^)^. Suppose for some event 

S^ae^W in^,K)nB'^iz)), 

we have 

A{z) ^ A{z), \/z e S{L) 
where A{z) is some deterministic number and P(H'^) ^ ^-pi^os^f yjufi 

KjX (logiVK)-V^/'. (3.1) 
Then there exists S' such that P(H') ^ 1 — |e~^ and for any z G S{L), 



\[Z]\^Cp'K'^>\ *:=\/^^^, in S'. (3.2) 

Remark 3.2. In the application of the above lemma in Section 4, we will set pn and 
K = 0{lp'^^^^). This lemma is analogous to Lemma 5.2 in [13](withp = 0(1)), Corollary 
in [14] and Lemma 4-i in [7], which are used in the contexts of Wigner matrices and sparse 
matrices. The basic idea is to utilize the fact that the entries of Green's function are weakly 
correlated. But in this work, we give a simple, general lemma on the cancellation of the weakly 
coupled random variables, which may not have the special structure of Green's function and 
thus useful in more general contexts (for applications to non-Hermitian matrices, see [3]). 
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3.1. Large deviations theory for weakly hierarchically coupled random variables. First, we 
are going to introduce the following general theorem which is similar to Theorem 5.6 of [6] 
and a similar result in [14], but our result is more general and only focus on the weakly 
coupled random variables and thus is independent of the structure of the matrix ensembles. 
Due of this reason it has be successfully used in [3] for the local circular law. 

Let X be finite set which may depend on N and 

Xi C X, 1 ^i^N . 

Let {xa, a G X} be a collection of independent random variables and Zi, . . . , be random 
variables which are functions of {xa, a & I}. Let E,/ denote the expectation value respect to 
{xa, « G Xj}. Define the commuting projection operators 

Q^ = l- E„ Pi = E„ = P„ Q\ = Qi, [Q„ P,] = [P„ Pj] = [Q„ Qj] = 

and for Ac {l,2,...,iV} 

QA■.= l[Q^, PA=l[P^ 

ieA ieA 

We use the notation 

1 ^ 

i=l 

Theorem 3.3 (Large deviations theory for weakly hierarchically coupled random vari- 
ables). Let H be an event andp an even integer, which may depend on N. Suppose following 
assumptions hold with some constants Co, cq > 0. 

(i) (Bound on QaZi in S). There exist deterministic positive numbers X < 1 and y such 
that for any set A C {1,2,..., A^} with i & A and \A\ ^ p, QaZi in H can be written 
as the sum of two new random variables 

i(s)(g^z,) = + i(H)g^i(s^)z,A (3.3) 

and 

\Z,,a\ ^ y{CoX\A\)^''\ \Z,,a\ ^ 3^iV^"l^l . (3.4) 

(ii) (Rough bound on Zi). 

max|Zi| ^ 3^iV^V (3.5) 
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(in) (E is a high probability event). 



-co(logAr)3/2p 



(3.6) 



Then, under the assumptions (i), (ii) and (Hi) above, we have 
for some C > and any sufficiently large N . 



(3.7) 



Remark 3.4. This lemma is based on a joint work with Professor H. T. Yau and we 
thank him for kindly allowing us to include it here. 

The intuition behind Theorem 3.3 is the following. If Z^'s are totally independent, i.e., 
QaZi = if 3j G a and i ^ j, we see that Yl is less than X] 1-^*1 by a factor A^~^/^; this 
is also a consequence of the usual CLT. In this case Zi only depends on {xa,<y G Ti}. For 
the general case considered in Theorem 3.3, Zi also weakly depends on sets {xa, ot E Ij} for 
i 7^ j. Here QjZi can be considered as the set {xa, a G X,} "acting" on Xi, and QkQjZi the 
action of {xa, a G X^} on the action of {xa, a G Xj } on Xi, so on and so forth. This lemma 
show that if the "action" is hierarchical, then indeed ^ Zi is much less than X] ^^e 
sense of (3.7). 



Before we give the proof of Theorem 3.3, we introduce a trivial but useful identity 

n n+1 



Y[{Xi + yi) = Y^ 

1=1 s = l 

with the convention that HieO = 1- It implies that 



i=\ J \i=s+l 



(3.8) 



j=i 



1=1 



^ nmax \yi\ ( max \xi + + max \xi \ ) 



For any 1 ^ ^ n, it follows from nr=i('^* + Vi) ~ i^k + Vk) Yliy^ki^i + Vi) Equation 
(3.8) that 



Ylixi + yi) = ^ {xk + yk) 

sj^k,s=l 



1=1 



s-1 



\iy^k,i=l / \ij^k,i=s+l 



(3.9) 
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Proof of Theorem 3.3. First, by definition, we have 



E 



ji,...,jp a=l 



For fixed ji, . . . , jp, let Tq, = Qj^Zj^. Now using (3.9) with choosing k = 1, Xj, = Pj^Ti and 
Hi = Qji^i in (3.9) (noting that Xi + yi = Ti), we have 



a=l s=2 



We define Aa,s ■= '^{a<s,a^i}{ji} and B^^s ■= la=s{ji}; thus 5^,,^ = {jj if a = s otherwise 
^a,s = 0- It is clear that Ai^g = Bi s = 0- Then 



p+i 



a=l s=2 a 

For generalization, we replace s with si and write it as 

P p+i 



= 5^ i(.i ^ 1) n^^-i^^-i^" 



a=l 



and 

Aa,si = {ji ■ a < si,a ^ 1}, Ba,si = {ji ■ Si = a} . 

Iterating for 1 ^ ji, j2, ■ ■ ■ , jp ^ N, we have 

p p+i 

where s denotes si, S2 ■ ■ ■ , Sp and /1q,^s and Ba,s are defined as 

^a,s = {ji ■ a < Si,aj^ i}, Ba,s = {ji ■ Si = a} . 

Then it follows that 



a=l 



^ (2p)Pmax JJl(si ^ 



(3.10) 
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Now to prove (3.7), it only remains to show that for any {ji, . . . ,jp} and s = {si, S2, ■ ■ ■ , Sp} 
such that Si i, we have 



^{Cpr^yPX'\ t:=|{j\,...,j,}| 



(3.11: 



For simphcity, we denote Aa, s and Ba, s by and Ba and denote the characteristic function 
by S. Thus we need to show that 



^{Cpy^y^X'\ t:=\{j,,...,jp}\. 



(3.12) 



Since Ti = Qj^Ti and the operators Pa^ and QbJs commute, we have 



(3.13) 



Hence we can assume that ji ^ P^a^iAa, and so 1 < si ^ (see (3.10)), ji G Uc^i-Bq. 
Similarly for jj, we have jj G Uq^j-Bq, here i = 2, . . . p. Recall that ja ^ -Bq. With these two 
constraints, the -B^'s satisfy the inequality 



p + t^J2\B^U{ja}\^2t, t:= \{ji,...,Jp}\ . 

a 

Now it only remains to prove (3.12) under the condition (3.14). First, we write 

p 

EUPa^QbT^ = E J](P^„g5^%), B^ := B^ U {jj • 



(3.14) 



Using (3.8) with x = PSQZ and y = PS'^QZ {x + y = PQZ ), we have 

p p+l / 1 p 



0=1 



=1 \ i=l 



i=s+l 



(3.15) 



First for s ^ p, we use the following formula. For any bounded functions / and h 

E\hiPE^Qf)\ ^ \\h\UiE^Qf)h ^ y¥{E^ ll/l|oo||/^||oo . (3.16) 
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Let 



s-l 



/i=n(^^.(s)^?B,%) n f=^js, p=pa., q=qb. 



i=l 



i=s+l 



By (3.5) and p ^ 1, we have 

\h\ <: yp-^N^p, l/l ^ yN^ . 

Then with (3.6), we have proved that (see (3.15)) 



s=l \ i=l 



i=s+l 



Thus the contribution from the above term can be neglected in proving (3.12). Then it only 
remains to bound the r.h.s of (3.15) in the case s = p + 1, i.e., we need to show that 



a=l 



^{CprPyPX'\ t:=|{ji,...,j,}| 



(3.17) 



under the assumption (3.14). Using (3.3) and (3.8), with x = PEZ and y = P'EQE'^Z we 
can write the l.h.s. of (3.17) as 



(3.18) El[{PAEQg^Z,J 



s=l \ i=l 



i=s+l 



Now we repeat the argument for (3.15). For s ^ p, one can use the following formula similar 
to (3.16). For any bounded function / and h 



E\hiPEQEJ)\ ^ \\h\U\iEJ)h ^ v/p(S^ 



oo II ^ II oo 



Let 



s—l p 
1=1 i=s+l 
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With the assumption in (3.4) and (3.14), we know Xls=i '^^ r-h.s of (3.18) is bounded above 
by 

YPN^Pexp[-c(logN)^/^p] 

which can be neglected in proving (3.17). For the main term, with s = p + 1 in r.h.s. of 
(3.18), using (3.4) and (3.14), we have 



a=l 



and this completes the proof of Theorem 3.3. 



□ 



3.2. A stronger bound of [Z]. In this section we are going to apply the abstract CLT 
result proved in Theorem 3.3 to obtain a stronger bound on [Z]. We note that using (2.16) 
and (2.2), Z can be written as 



Zi — Qi 



-1 



1 Qi • 1 Pil Pi ■ -I^Xi • 



(3.19) 



Lemma 3.5. Let Zi = {Ga) ^, Pi and Qi defined as in (3.19). We assume that rj = 
^ for some C > 0. Suppose there exists an even integer p and an event S, such 
that P(H^) ^ e-P(^°s^)'^', and m E 



max|g,z,| ^ cyx, 



Aoiz) 



min,; IGiAz) 



^ CA" < 1, min \Gii{z)\ ^ y-\ p ^ 



C 



{\ogN)X 

(3.20) 



where X 1 and y are deterministic numbers. Then there exists S' with P((S')'^) ^ e ^ 
and in S', 



^ Cp^ [x^ + N-^) y 



(3.21) 



Proof of Lemma 3.5. We are going to apply Theorem 3.3. The claim given in (3.21) 
will follow from (3.7) and Markov's inequality. Using the hypothesis, one can easily verify 
(3.5) and (3.6) in the hypothesis of Theorem 3.3. It only remains to show that for i G A C 
{1, 2 ■ ■ ■ , A^} and ^ p there exist Zi^A and Zi^A 



i(s)(g^Zi) = z,^A + i(s)g^(s^)z,,A, ^ y{cx\A\y \ Zi^^ ^ yN'^^^\ (3.22) 
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for some C > 0. By assumption, Equation (3.22) holds when A = {i}. Thus we assume that 
\A\ ^ 2. As in Lemma 5.1 in [6] let A = A{H) = ^(X+X) be a function of X^X, and define 



(A) 



s,u 



J2 (-l)l''U(^), A(^) := A ((X(^))t(X(^))) 

s/ucvcs 



for any S,U G {1,2, ■■■ , N}. Then we have 



s,u 



ucs 



By definition, (A)^'^ is independent of the j-th column of X, if j G S/U. Therefore, 



QsA = Qs (A) 



s,s 



In our case, 



Qa^^i = QiQA/{i}2^i = Qa (^'q~^ 



A/{^},A/{^} 



Now we choose 



Z,,A := 1(S)Q^S ( — 



A/{^},A/{i} 



Z, A := I — 



A/{^},A/{^} 



It is easy to prove the bound for Zi^A in (3.22) using its definition. For bounding Zi^^, it only 
remains to prove that for 2 ^ |A| ^ p^r 



Gi 



A/{i},A/{i} 



^y{CA:\A\) 



\A\ 



(3.23) 



To prove it, we first show that for |T| ^ p, 
max I G,-? I ^ C max I G,- 



min I G,- J'' I ^ c min I Gi 



with the constants C,c independent of N,i,j. We start from |T| = 1, i.e., T 
using (2.18) and the hypotheses of this lemma, we have 



(3.24) 
{k}. First 



GijGji 



+ (GS?)-^ = (l + 0(A'^))(Gin 



iJh-i 
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It follows that 



(fc)i 



GikGkj 



'^0 Q 



kk 



^Aoil + OiX)) . 



max\G\f\ ^ {l + 0{X))max\Gij\, mm\GP\ ^ {1 - 0{X)) mm \Gii\ . 

i,jj^k ■' i,j i^k i 

Then using induction on |T| and the assumption Xp <^ 1, we obtain the desired result (3.24). 

Now we return to prove (3.23) for the case \A\ = 2. li i ^ j , using (2.18), (3.24) and 
(3.20), we have 

The general case has been proved in Lemma 5.11 of [6] (also see below), which gives that 

1 \ maxi j^T,TcA/{i} G]. 



G,J - V I 1^ , ,^m,\l^l+i ■ 



(mm^^j jciA/{i} ICj-pl) 



Together with (3.24) and (3.20), we obtain (3.23) for \A\ = 2. 

At last we need to point out that the definition of G^J^ {ij ^ V) in [6] is different from 
this paper, though they are equivalent. In this paper, we define 

G(^) = ((X(^))t(x(^))-2)"' 

whereas in [6], it is defined as 

where H^^^^ is the minor of H obtained by removing all i^^ row and columns of H indexed 
hyieV. But one can see that ii H = X^X then H^^'> = (X(^))t(X(^)). Thus we finish the 
proof of Lemma 3.5. □ 

Finally we give the proof of the main result of this section. 

Proof of Lemma 3.1. It is a special case of Lemma 3.5 with X = K'^/ and y = G for a 
constant G (possibly large, but independent of N). First the bound maxj \QiZi\ ^ GyX is 
proved in (2.33). By assumption if H C H^^gj.^^ K) fl 'Q''{z)), then 

Ao, Xd^Km ^K^! = X ^ GK{Nr]Y^/'^ < 1 
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in S. Thus we obtain 

^"^^^ ^ CA" < 1, min \GJz)\ ^ . 



mim IGiiiz] 



Furthermore Equation (3.1) and t] ^ N ^(p^ (since z G S{L)) imply that p ^ C{(logN)X) ^ 
and the proof of Theorem 3.1 is finished. □ 

4. Strong Marcenko Pastur law and rigidity of eigenvalues. In this section, our goal is 
to prove Theorem 1.3 and Theorem 1.5. Let us first give a brief sketch of the proof strategy 
for the main technical estimate (1.19). We will prove, by an induction on the exponent r, 
that A{z) ^ [Nr])"^ holds modulo logarithmic factors with high probability. Notice that 
we have already proved this statement for r = 1/4 in Theorem 2.1. Lemma 2.13 asserts 
that if this statement is true for some r, then it also holds for assuming a bound on 
[Z]. Now, an application of Lemma 3 will yield that the required bound for [Z] holds with 
high probability. Repeating the induction step for O (log log A^) times, we will obtain that r 
is essentially one, implying Theorem 1.3. However, we have to keep track of the increasing 
logarithmic factors and the deteriorating probability estimates of the exceptional sets. 

4.1. Proof of Theorem 1.3. We start by establishing (1.19) and (1.20). 

Proof of (1.19) and (1.20). Without loss of generality, we assume C ^ 1- Using Lemma 
2.9 and Lemma 2.1, for any C > 0, there exists such that 

Sic fl B^(z)nr(z,Cc) (4.1) 

holds with {C + 4)-high probability. Then from Lemma 2.12, we see that for z G S(3C^), 

\V{m){z)\ ^ (/.2Q^2 ^ in Si . (4.2) 

Let Ai = 1, so that A ^ Ai in Therefore, we can apply Lemma 3.1 with 

p = p^ = -log[l -P(Hi)]/(logiV)2 . 

Without loss of generality, we can assume that P(Si) is not too close to 1; otherwise, we can 
replace Si by a subset of itself. Then from Definition 1.2, it follows that 



p, = Cv^^'/ {log Nf 
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We assume that ^ 6^ and therefore (3.1) holds. Then (3.2) gives that for z G S(3C<j), 
there exists S2 such that 

S2 C Hi, P(H2) = 1 - e^^^ 

and 



\\Z\ \ ^ (^2Cc+llC^2^ 



, m ^2 • 



Nr] 

Since in S2 C Si, by (4.2), A ^ Ai and thus \1/ ^ \l/i in S2, and consequently 

|P(m)(.)K^^^c+n ^^;^ + ^\ in S2 . (4.3) 

Then applying Lemma 2.13, (2.47) shows that for z G S(3Cf) 

k{z) ^ k^{z) := ip'=^<+''^Ay\Nri)-'/\ in S2 . 
Now the proof proceeds via iterating the above process. Indeed, by choosing 

P2 = -log[l -P(S2)]/(logAr)2 = C^^+7(logAr)4 
we deduce that there exists S3 such that 

S3CS2, P(S3) = 1 - e-^'^ 

and for z G S(3C^) 

Now we iterate this process K times, K := loglog A^/(log 1.9) . For k ^ K,we infer that for 
some 

Sfc C Sfc_i, P(Sfc) = 1 - 

where 

Pk = -log[l - P(Sfc_i)]/(logiV)2 = Cv^^+7(logiV)2'= ^ 
and for z G S(3C^) 

A(.) ^ A,+i(.) := ^^c+6Ca1/2(^^)-i/2 ^ ^2C,+i2C^N^yiHi/2r^ S,+i . 
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Note that 

Thus for k = K, ioT z E S(3Cf) the bound 

A{Z) ^ Ak+l{z) ^ <^2Cc+12C(^^)-l+(l/2)- ^ ^2Cc+12C+l(^^)-l (4 4) 

holds with C-high probabihty and this completes the proof of (1.19). Furthermore, since 
Ek+i C Hi with (4.1), we obtain (1.20). □ 

Next we assume (1.21) holds and prove (1.22) first. 

Proof of (1.22). Using (1.20), we have for any i, 

max ^GJE + iif^^iN^^) . (4.5) 



By definition, 

' Gii = 



(A,-E)2 + r/2 

Then choosing E = Xa and rj = ip'-''-N~^, using (4.5), we deduce that for any index a 

which implies (1.22). Here Equation (1.21) guarantees that A_/5 ^ £" ^ 5A_^. □ 

Now to establish Theorem 1.3 all that remains is the proof of (1.21) which we give below. 

Proof of (1.21). The proof proceeds via establishing the following four steps. 
• Step 1: For any C > 0, there exists some -D^ > such that 

max{Aj : A^ ^ 5A+} ^ A+ + N'^/^ip^^^ 

and 

min{Aj : \j ^ ld>iA_/5} ^ A_ - A^^^/s^Dc 
hold with C-high probability. 
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• Step 2: Recall n{E) in (1.11) and ndE) in (1.12). We will show that 

(n(Ei) - n{E,)) - ME,) - n,{E,)) | ^ ^(^"S^)^^''^ e„ E, e [ld>iA-/4, 4A+] 

(4.6) 

which implies that 

# {j : A, ^ [ld>iA-/5, 5A+]} ^ . (4.7) 

We note that though we only need (4.7) for (1.21), but (4.6) will be used later to prove 
Theorem 1.5. 

• Step 3: Next using the above 2 steps we will show that max^ ^ 5A4., with (^-high 
probability. This step will imply (1.21) in the case d < 1. 

• Step 4: Finally, we show that, for d > 1, i.e., N > M, we have Am ^ A_/5, with C-high 
probability. 

Step one of proof on (1.21): By repeating the iteration in the proof of (1.21) one more time, 
i.e., replacing Ai in (4.3) with A^+i in (4.4), we obtain 

^ _|_ 

|P(m)(.)K^^c^_J^ 

for some large C^. From (2.46) again, we obtain that for some ^ 1 

D, ^ ^ ferric 



For any E such that E ^ X+ + N-'^/^Lp'^^< , and 

:= ^-^ciV-V2^i/4^ K = E-\+ 

(thus K ^ A^"^/^v9^^«), it is easy to check that 



(4.9) 



Using (2.20) and (4.9), we have 



'^mJz) = C^ (4.10) 

'K 
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which imphes 

Therefore, 5. Together with (4.8) and (4.9), we have 

Combining (4.10) and the last inequahty of (4.9) yields 
and therefore we can conclude that 

miz) <^ . 

Note that if '^m{z) < {2Nri)~^ (recall z = E + ir]), then the number of the eigenvalues in 
the interval [E — r],E + rj] is zero, which is implied by the following observation : 



Since Q'm(z) <^ holds for any ^ A+ + N~'^^^(f^^'^ , we have proved that for any C > 0? 
there exists some -D^ > such that 

max{Aj- : ^ 5A+} ^ A+ + N^^^^ip^^^ 

holds with (^-high probability. An analogous bound for the smallest eigenvalue can be proved 
similarly. 

Step two of proof on (1.21): The proof is similar to that of Theorem 2.2 in [14]. The 
strategy is to translate the information on the Stieltjes transform obtained in Theorem 1.3 
to prove (4.6) on the location of the eigenvalues. 

In the following lemma, Ai,A2 represent two numbers with \Ai + ^421 ^ 0(1). For any 
Ei,E2 e [Ai, A2], and 77 = A^-^ we define 

fW '■= fEi,E2,vW 

to be the characteristic function of [Ei, E2] smoothed on scale 77, i.e., / = 1 on [Ei+r], £"2— J]], 
/ = on M \ [£i, E2] and |/'| ^ Cr]-\ \ f"\ ^ Cr]-\ 
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Lemma 4.1. Let be a signed measure on the real line and he the Stieltjes transform 
of . Suppose for some positive number U (which may depend on N) we have 



|m^(x + iy)\ ^ 



CU 



for y <1, X e [Ai, 



Then 



/i?i,£;2,r?(A)^''^(A)dA 



CU\ log?7| 

N 



(4.12) 



(4.13) 



Proof of Lemma 4.1:. For notational simplicity, we drop the A superscript in the 
proof. Let xiv) be a smooth cutoff function with support in [—1, 1], with xiv) = 1 fo^' 
\y\ ^1/2 and with bonded derivatives. Using Helffer-Sjostrand functional calculus, we ob- 
tain 



/(A) = 
Since / and x are real, 

j /(A)f^(A)dA 

(4.14) 



2-K 



wf"{x)x{y) + i{f{x) + iyf'{x))x'{y) 

X — X — iy 



'J 

Jl 


' (1/(^)1 


+c 


/ / 






+c 


/ / 







^ C I {\f{x)\ + \y\\fix)\)\x'iy)\\m{x + zy)\dxdy 
yf"{x)x{y) '^m{x + iy)dx dy 
yf"{x)x{y) '^m{x + iy)dx dy . 
Using (4.12), the first term can be estimated as 

(1/(^)1 + \y\\f'{x)\)\x'{y)\\m{x + zy)\dxdy ^ CU. (4.15) 



For the second term in r.h.s of (4.14), notice that from (4.12) it follows that, for any < 



y\ '^m{x + iy)\ ^ CU. 



(4.16) 



With |/"| ^ Cr/-2 and 



supp/'(x) C {\x - Ei\ ^ r/} U {|a; - ^ r/}. 



(4.17) 
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we get 



yf"{x)x{y) rn{x + iy)dx dy 



^ CU. 



Now we integrate the third term in (4.14) by parts first in x, then in y. Then we bound it 
with absolute value by 



C / r]\f{x)\\^m{x + ir])\dx + C / y\f{x)x'{y)^m{x + iy)\ (4.18) 
H — / / \^m{x + iy)\dx dy. 

V J Tj^y^l J suppf 

By using (4.12) and (4.17) in the first term, (4.15) in the second and (4.12) in the third, we 
have 

(4.18) ^CU + CUt]-^ I dx I \dy ^ CU\ logr/| . 



This finishes the proof of Lemma 4.1. 



□ 



We will apply Lemma 4.1 with [^1,^42] C [ld>iA_/4, 4A+] and the signed measure g'^ 
equal to the difference of the empirical density and the MP law: 

^^^(dA) = gidX) - f?,(A)dA, gidX) := ^ $^5(A, - A). 



Now we prove that (4.6) holds. By Theorem 1.3, ii y ^ yo := Lf'^^'^/N, the assumptions of 
Lemma 4.1 hold for the difference = m — nic and U = ifP'^ . For ?/ ^ ?/o , set 2 = x + 
zq = X \ iyo and estimate 



/yo 
\dri {m{x + irj) — mc{x + 277)) |d?7. (4.19) 



Note that 



\dr^m{x + ir])\ =\j;^^drjGjj{x + ir]) 



'^m{x + it]), 



jk 
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and similarly 



\drjmc{x + ir])\ 



Qc{s) 



{s — X — ir]Y 



ds 



^ ;ds = -^rricix + irj). 



X — irj\ 



V 



Now we use the fact that the functions y — y'^m{x + iy) and y — )■ yQmw{x + iy) are 
monotone increasing for any y > since both are Stieltjes transforms of a positive measure. 
Therefore the integral in (4.19) can be bounded by {zq = x + iyo) 



d?7 

7] 



yo 



d?7 

7f 



(4.20) 



By definition, ^mc{x + iyo) ^ \mc{x + z?/o)| ^ C*- By the choice of yo and Theorem 1.3, 
we have 



S m(x + iyo) ^ 55mc(x + iyo) + 



^ c 



(4.21) 



with C"high probability probability for any > 0. Together with (4.20) and (4.19), this 
proves that (4.12) holds for y ^ yo as well if U is increased to ?7 = C^p'^i. 
The application of Lemma 4.1 shows that for any rj ^ 1/A^ 



C{\ogN)ip^c 
N 



(4.22) 



Using the fact y — )■ y'^m{x + iy) is monotone increasing for any y > 0, we now use (4.21) 
to deduce a crude upper bound on the empirical density. Indeed, for any interval / : = 
[x — r],x + T]], with r] = l/N, we have 



n(x + r]) — n(x — t]) ^ Cr] 53 m(x + i?]) ^ Cyo m(x + iyo) ^ 
Equations (4.22) and (4.23) yield (4.6) and we have achieved Step 2. 



N 



(4.23) 



Step three of proof on (1.21): Now we prove Ai ^ 5A+ holds with ^-high probability. Note 
that there is nothing special about the number 5 and below we show that some large K, 

Ai ^ K\+ 

with C-high probability. Let 

z = E + i7], T] = EN-"^'^ . (4.24) 
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With (4.6) and choosing Ei = A- and E2 = KX^, we have proved that there are at least 
(^o(i) eigenvalues larger than K\^. Then by definition, 



(T) ^ Cr] ip^c 



^m^^^\^CE-' + ^^OiE~') 
Nr] 



(4.25) 



for any index set T with |T| = 0(1). Now using Lemma 1.6, as in (2.30) and (2.33), we have 



\Z.\ ^ \E\ I^E-'N-'/' + ^) , (x„^(^'^-)x,) ^ E-'N-'/' + ^ 

First we estimate Gu, with (2.16), (2.2) and (2.31), 

\Gii\ = \ l - z - d - zdm^'\z) - ZA'^ 



(4.26) 



and 



-E~' ^ \Gu\ ^ 2E-' 



(4.27) 



where we used (4.25), (4.26), 77 = EN ^/'^ and the fact K is large enough. Similarly for Gij, 
from (2.17) and (2.30) it follows that 



Furthermore with (2.18) and (2.17), 



(4.28) 



I (i) I 

\m^ ' — ml 



N 



E 



GjiGij 



Gi, 



^ E^ 



Nr] 



Using these bounds. 



Gi 



1 — z — d — zdra 



+ 0(m(') - m) + 



{1 — z — d — zdmY 



+ E-^'OiZl 



and 



m 



^ E G.. = + + ^-'^-"') ' + OiE-^Z]) (4.29) 
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Since |3?(1 — z — d — zdm)\ ^ |$5(1 — z — d — zdm)\, 



\ — ^CE^^r] + -^miz) . (4.30) 

1 — z — d — zdm 2 



Together with (4.29) and (4.26), with C-high probabihty. 

^m{z) ^ CE-^f] + E-'(^ + E-'N-'/A = ( , ^, , ^ , 
^ ^ ' V / V ^ ^ E J Nr] 

li E ^ for some e > 0, with C-high probabihty we have 

5>m<-^. (4.32) 

Nr] ^ ' 

From the observation made in (4.11), it follows that there are no eigenvalues in the interval 
[i? — ?7, + 77] with C-high probability, or equivalently there are no eigenvalues larger than 
A^^ with C-high probability. 

Now, it only remains to prove (4.32) for K\j^ "^E ^ N^ . Using the above result, maxj Aj ^ 
A^^, with C-high probability we have 



\GiA > N- 



-2e 



Therefore, applying (3.21) and (3.19) with X = N'- (^N-^^^ + y = N^^ and p = N^ 

and by using (4.24), (4.28), (4.26), (4.27), we have 



Inserting it in (4.29), with (4.25), (4.30), we obtain that the conclusion (4.32) with C-high 
probability for ^ E ^ N^. Again using (4.11), we deduce that there are no eigenvalues 
located in the interval [/'^A+, A^^] with C-high probability. Thus we have achieved step 3. 

Step four of proof on (1.21): Now we prove the last component of the proof for (1.21), i.e., 
in the case of d > 1 and thus N > M, we have Xm ^ A_/5, As remarked earlier, it only 
remains to prove that for some large K, the following bound holds with C-high probability, 

Xm > X^/K. (4.33) 

Recall G = (XXt - z)-\ Let 

z = E + tr], O^E^ X_/K, r] = N'^/^'^ (4.34) 
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for some small enough e > 0. Recall that we have proved that among A,/, i ^ M, there are 
at least genvalues less than A_ . Then for some C, c ^ 

c^l^Trg{z)^Cv + ^, c^^j^TTgiz)^C . (4.35) 
In the above the term is contributed by these 

^OW ei genvalues. Using Cauchy's inter- 
lacing theorem of eigenvalues, it is easy to see that (4.35) also holds for for |T| = 0(1). 
Then using Lemma 1.6, with (^-high probability. 



\Zi\ ^ \z\ 



iV-1/2 + ^ W |z|iV-V2+2., ^^^^ gi^,J) . ^ ^-1/2 + ^ ^ N-V2+2e _ (4^33) 

N'T] J Nrj 



First using (2.16), we obtain, 

Gu=[-z-zd^i:ig^\z)-z}j . (4.37) 

Then using (4.35) we deduce that with C-high probability, 

c\z\-^ ^\Gi.i\^C\z\~K (4.38) 
Similarly from (2.17), it follows that with C-high probability. 

As (1.33), we have 

TrG«(2) - Tr^(')(2) = ^ = TrG(z) - Tr^(z) + - . 

z z 

Together with (4.37), 

= (^-z - zd^ Tr g{z) - zd (^m« - m - - . 

Using the bound (see (4.35)), 

c\z\ ^ \ — z — zd— Tr ^(2)1 ^ G\z\ , 
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Equation (4.36) and \m^^^—m\ ^ {Nr])~^, we take the average of Gu and use Taylor expansion 
to obtain (similar to (4.29)) 

1 — z — d — zdm{z) ~'~ ^ ^ 

S := \z\-^0 ^(m» - m) - (iV^)-' j + \z\-^0{[Z]) + \z\-^0{N-^+^') 

with (^-high probability. Similarly, by estimating the difference Gu — Gjj, we have 

\Gii-m\^\z\^^N^'^''+^' , (4.41) 

with C-high probability. First for the term — m in (4.40), using (2.18), (4.38) and (4.41), 
we have 



(i) 



1 GjiGij _ 1 [G ]ii _ 1 [G ]ii I |r^2l I 

j 

Averaging m^*^ — m, we obtain that 

i ^ (,„'•) - m) = + Od^lA'-Z-C') J] I IG%,\ . (4.42) 

i i 

Since we have proved that there are at least ip'^^^^ non-zero eigenvalues less than 0.9A_, then 
under (4.34), with C-high probability 

Tr[G^] = J2 = + + 0{N) . (4.43) 

a 

These three terms come from zero eigenvalues, small eigenvalues (which are less than 0.9A_) 
and the eigenvalues in the interval [A_ , A+] respectively. We denote the three terms appearing 
in the r.h.s of (4.43) as Tq, T, and T„ respectively. Similarly, we have (note that here z ^ 0(1) 
is small enough) 

Nm = Tt[G] = ^LlJ^. + 0((/^^0r7"i + OiN) = (1 + Oiz)) , (4.44) 

—z —z 

with C-high probability and 



\{G%.\ i. 



\uJi)? 



(A. - zf 



|2 



|2 
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The last bound implies that 



Together with (4.42), we have 



1 ^(„(0 _ ,„) = J_Iim + om-'N-^''-^') . (4,45) 



Dividing (4.43) by Nm (see (4.44)), for |^| small enough, we have 



Tr(G^) -1 



— + 0{zN'') + 0(1) . (4.46) 

JMm z 

Recall (5 from (4.40). Now combining (4.45) and (4.46) with (4.40), we obtain 

(5^0 (|z-2|iV-3/2+c^ + |z-i|iV-i+f^^) + |z|-20([Z]) . (4.47) 

Now we apply Lemma 3.5 (with X = A^~i/2+Ce^ y = C\z\ and p = N^) to estimate [Z]. 
Using Lemma 3.5, (4.36), (4.38) and (4.39), we get 

\z\-^\[Z]\ ^ \z\~^N-^+^' . 

Combining the above with (4.47) gives 



Using (4.40) and the definition of m. 



Cl 



1 1 

m — rric = ; ; — ^ ; z ^ + 

1 — z — a — zdm{z) 1 — z — a — zdmc[z) 



which implies that 

zd 



[1 — z — d — zdm{z)){l — z — d — zdmc{z)) 



— 1 I (m — nir) = S . 



As above, we have c\z\ ^ \1 — z — d — zdm{z)\, \1 — z — d — zdmc{z)\ ^ C\z\ for all l^l ^ 
for a constant Eq independent of A^. Therefore, we have 

\m — rricl ^ \z6\ . 
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Using (4.48), we have 

Furthermore, it is easy to prove that 

1 - 

-— j = 0{ri) < {Nri)-^ . 

Together with Tr G = Tr ^ — z~'^{N — M), we obtain 

V 

with C-high probabihty. As in (4.11), we have Aq, ^ [E — r],E + t]] for E G [0, A_/i^] with 
large enough K = 0(1) obtaining (4.33). This completes the step 4 and we have proved 
(1.21) . □ 

Thus we have verified (1.19), (1.19), (1.21) and (1.22) and have thus finished the proof of 
Theorem 1.3. 




4.2. Proof of Theorem 1.5. We give the proof in two steps: 

Proof of (1.24). Recall (4.6) and the fact that there is no eigenvalue in (0,A_/4] U 
[4A+, +oo]. Thus we gather that 



max 



n{E) - n,{E) 



N 



(4.49) 



holds with C"high probability. The supremum over is a standard argument for extremely 
small events and we omit the details. □ 



Now we give the proof of (1.23). 



Proof of (1.23). The proof is very similar to the one for generalized Wigner matrix 
obtained in Equation (2.25) of [14]. For the reader's sake, we reproduce that argument below. 
By symmetry, we assume that 1 ^ j ^ N/2 and set E = ■jj, E' = \j. Also = (log N)(p'-^i 
for compactness of notation. From (4.49) we have 



n,{E) = n(E') = n,{E') + 0(t^/iV). 



(4.50) 
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Clearly E ^ \c := (A+ + 3A_)/4, and using (4.49) we see that E' ^ Ac also holds with 
^-high probability. First, using (1.21) and 

n^{x) ~ (A+ - x)^/^ for \c ^ x ^ A+, (4.51) 

or equivalently, 



n,{E) = n,(7,) = ^ ~ (A+ - Ef 



^ ~ (A+ - E)^/^ 

we know that (1.23) holds (possibly with a larger constant) if 

E,E' ^ X+-tNN-^^\ 

Hence, we can assume that one of E and E' is in the interval [A^, A+ - t^N^^^^]. With 
(4.51), this assumption implies that at least one of nc{E) and nc{E') is larger than t^Jj"^ /N . 
Inserting this information into (4.50), we obtain that both nc{E) and nc{E') are positive and 



n. 



.,{E)=n,{E')[l + 0{r^' 



-1/2 



and in particular, \^ — E — E' . Using that n'^{x) ~ (A+ — x)^/^ for A^ ^ x ^ A4., 

we obtain that n'^{E) ~ n'^{E'), and in fact n'J^E) is comparable with n'J^E") for any E" 
between E and E' . Then with Taylor's expansion, we have 

W{E')-n,{E)\^CK{E)\\E' -E\. (4.52) 

Since n'^{E) = Qc{E) ~ a/k and ndE) ~ k^/^, moreover, hy E = 7^ we also have ndE) = 
j/N, we obtain from (4.50) and (4.52) that 

, _ C\n,{E')-n,{E)\ ^ Ct^ ^ Cti, ^ Ct^ 



n'^E) Nn'^E) N{n,{E)y/3 ^ A^s/s^'i/s' 

which proves (1.23), again, after increasing power. □ 

We have proved (1.23) and (1.24) and the proof of Theorem 1.5 is finished. 

5. Universality of eigenvalues in Bulk. In this section, our goal is to prove Theorem 
1.8. As mentioned in the introduction, our arguments are valid for both real and complex 
valued entries. First, we consider a flow of random matrices Xt satisfying the following matrix 
valued stochastic differential equation 

dXt = -^dPt-lxtdt, (5.1) 
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where Pt is a real matrix valued process whose elements are standard real valued independent 
Brownian motions. The initial condition Xq = X = [xij] satisfying (1.1) and (1.2). For any 
fixed t ^ 0, the distribution of Xt coincides with that of 

Xt^e-'/'Xo + il-e~Y''V, (5.2) 

where V is a. real matrix with Gaussian entries which have mean and variance 1/M. The 
singular values of the matrix Xt also satisfy a system of coupled SDEs which is also called 
the Dyson Brownian motion (with a drift in our case). More precisely, let 

11 = /iAr(dw) = dw 



Es4E'o.K-?l-(^l+^f^)E>o. 

i=l i<j i=l 



Wi 



(5.3) 



denote the joint distribution of the singular values of X when the matrix X has independent 
Gaussian entries (so that X'^X is a Wishart random matrix). In Equation 5.3, the constant 
P takes values {1, 2} with /3 = 2 for complex entries and (3 = 1 for real valued entries. Also, 
is the normalization constant so that /i is a probability measure. Denote the distribution 
of the singular values at time t by /t(w)/i(dw). Then ft satisfies 

dtft = Cft, (5.4) 

where 

^»'-^?:.-i:^^?+i:(-^4E^^i(^(^^)+ 

i=l i=l \ j^i ' J 



For any n ^ 1 we define the ra-point correlation functions (marginals) of the probability 
measure /(d/i by 

Pt^liWi, W2,...,Wn)= / ft{-w)fl{w)dWn+l ■ ■ ■ dWN- (5.6) 

With a slight abuse of notations, we will sometimes also use /i to denote the density of the 
measure /i with respect to the Lebesgue measure. The correlation functions of the equilibrium 
measure are denoted by 

P^f^}^{Wl,W2,...,Wn) = fi{w)dWn+l- ■ .dWN- (5.7) 
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Now we are ready to prove the strong local ergodicity of the Dyson Brownian motion 
which states that the correlation functions of the Dyson Browian motion p[^^ and those of 

the equihbrium measure p^^J^ are close: 

Theorem 5.1. Let X = [xij] with the entries Xij satisfying (1.1) and (1.2). Let E G 
[A_ + r, A+ — r] with some r > 0. Then for any e' > 0, 6 > 0, < b = bj\f < r /2, any integer 
n ^ 1 and for any compactly supported continuous test function O : — )■ M we have 



sup 

t>7V-l+«+e' 



E+b 



E-b 



dE' 



dai . . . dan 0{ai, 



2b ^'"^--•••'^-,,(E)" 



(") _ J^) \ ( E' + "1 ... E' ^ — 



X \Pt,N-Pl,:N 



NQ,{Ey-' Nq,{E). 



where p^^Ij andp^^]^, (5.6) -(5.7), are the correlation functions of the eigenvalues of the Dyson 
Brownian motion flow (5.2) and those of the equilibrium measure respectively and Cn is a 
constant. 

Remark 5.2. Notice that if we choose b = \ — 2e' and thus t = N~^' , then we can set 
b ~ A^^^+^^ so that the right hand side of (5.8) vanishes as N oo. From the MP law we 
know that the spacing of the eigenvalues in the bulk is 0{N~^) and thus we see that Theorem 
5.1 yields universality with almost no averaging in E. 

Proof of Theorem 5.1. The proof follows from the main result in [10] (Theorem 2.1) 
which states that the local ergodicity of Dyson Brownian motion ((5.8)) holds for t ^ j\r-2a+5 
for any 5 > provided that there exists an a > such that 



1 ^ 

sup E^(A,(t)-7,)'^CiV-i-'« (5.9) 



holds with a constant C uniformly in A^. Here a/ Aj(t) is the singular value of the matrix Xt 
given in (5.2). The condition (5.9) is a simple consequence of (1.23) as long as o < 1/2. 

Strictly speaking, there are four assumptions in the hypothesis of Theorem 2.1 in [10]. 
Assumptions I and II of Theorem 2.1 in [10] are automatically satisfied in the setting that 
the Dyson Brownian motion is generated by flows on the Covariance matrix ensembles. 
Assumption IV of Theorem of [10] states that the local density of the singular values of Xt 
in the scale larger than A^"^"'"'^ for any c > 0, is bounded above by a constant. As in [10] this 
follows from the large deviation estimate (1.19) since a bound on 53m(z), z = E + ir], can be 
easily used to prove an upper bound on the local density of eigenvalues in a window of size t] 
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about E. As usual, the additional condition in [10] on the entropy Sfj,{ftg) ^ CN"^ for some 
constant m for to = A^~^°, holds due to the regularization property of the Ornstein-Uhlenbeck 
process. Thus for a given < e' < 1, choosing a = 1/2 — e' /2, A = e' in the second part of 
Theorem 2.1 in [10] and using (1.23), we obtain (5.9) and the proof is finished. □ 

For any e > 0, applying Theorem 5.1 with 6 = 1 — 2e, e' = e and b = —1 + 8e, we obtain 
universality for all ensembles with the matrix elements distributed according to M"^/^^^ with 

Ct = e-'/'^o + {l-e-'Y/'^G, (5.10) 

where the matrix has independent Gaussian random variables with mean and variance 
1, t ~ A^~^ and the initial condition has entries satisfying our conditions (1.1) and (1.2). 
In other words, for t ~ A^~^ the random matrices C,t which are distributed according to (5.10) 
have the same correlation functions as that of the matrix with Gaussian entries, averaged 
on a length of 0{N~^~^^'^). Thus in order to prove Theorem 1.8, it remains to find a random 
matrix of the form (5.10) (with time t = N~'^) whose eigenvalue correlation functions well 
approximate that of the spectrum of the given matrix X satisfying (1.1) and (1.2). 

The requirements on the entries of the matrix C,t are just mean zero, variance one and 
subexponential decay; however it turns out that for any fixed X and e, one may find a 
such that satisfies (5.10), with t ~ A^~^, and the entries {C,t)ij have mean 0, variance 1 and 
the same third moment as those of the (rescaled) initial condition \/MX. Moreover can 
be chosen in such a way so that its entries have fourth moment very close to those of X. 
More precisely. Lemma 3.4 in [13] yields that for any given matrix X satisfying (1.1) and 
(1.2) and t ~ A^~^, there exists a matrix of the form (5.10) such that for 1 ^ A; ^ 3, 



E VMxf^. = E (6)J,., E {VMx.jr - E (e 



Now to finish the proof of Theorem 1.8, it only remains to show that that the correlation 
functions of the eigenvalues of two matrix ensembles at a fixed energy [i.e., for a fixed value 
of E = 3ft(z)) are identical up to the scale 1/A^ provided that the first four moments of the 
matrix elements of these two ensembles are almost identical in above sense. To achieve this, 
as shown for the Wigner matrices [12] (see Section (8.6-8.13) of [12]), it is enough to show 
that the corresponding Green functions are close for these two matrix ensembles. This is the 
content of the following theorem which we call, following [12], the Green function comparison 
theorem. 

Let = [xjj], with the entries xj^ satisfying (1.1) and (1.2), = X^^^X^ and let 
G'^(z) = (X'^^X"^ — z)~^ = {H^ — z)~^ be the Green function corresponding to X"^. Define 
the matrices X"^, H"" and the Green function ^"{z) analogously. 
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Theorem 5.3. Assume that the first three moments of xj^ and are identical, i.e., 



^ n ^ 3, 



and the difference between the fourth moments of xjj and a;]^ is much less than 1, say 



(5.11) 



for some given 6 > 0. Let s > be arbitrary and choose an rj with N ^ ^ rj ^ N ^. For 
any sequence of positive integers ki, . . . , kn, set complex parameters 

zf = Ef ±iri, j = l,...ki, m = l,...,n 

with an arbitrary choice of the ± signs and A_ + k ^ 1-^7*1 ^ ~ foi" some k, > 0. Let 
F{xi, . . . , Xn) be a function such that for any multi-index a = [ai, . . . , an) with 1 ^ |a| ^5 
and for any e' > sufficiently small, we have 



max 

max ||9°F(xi,...,x„)| : max|xj| ^ A^^j ^ N^" 



(5.12) 
(5.13) 



for some constant Cq. 

Then, there is a constant Ci, depending on a, ^'^^ such that for any rj with 

A^"^"^ ^ ?7 ^ and for any choices of the signs in the imaginary part of 



1 



Tr 



1 



Tr 



b=i 



-EF(G'^^G'^) (5.14) 



where in the second term the arguments of F are changed from the Green functions of 
to H"^ and all other parameters remain unchanged. 

Once again we note the equivalence of (5.8) and (5.14) as discussed in [12] (Sections 8.6- 
8.13). The only difference is that in [12], the equivalence is proved for Wigner matrices, but 
the arguments are easily adapted for covariance matrices. Thus to complete the proof of 
Theorem 1.8, all that remains is Theorem 5.3 which is proved below. 
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Proof of Theorem 5.3. The proof is very similar to Lemma 2.3 of [12]. The only 
differences are a few simple linear algebraic identities. Therefore, we will only prove the 
simple case of /c = 1 and n = 1. 

Fix a bijective ordering map on the index set of the independent matrix elements, 

: { ( z , J ) : 1 ^ ^ ^ M, 1 ^ J ^ iV } ^ { 1 , . . . , MiV } , 
and define the family of random matrices X^, 0^7^ MN, 

[X4. = [X-],, 0(2,j)>7, 
= [X-],, 0(z,j)^7. 

In particular we have Xq = and Xmn = X"^. Denote H^, and as 

= XlX^, = {H^ - z)~\ = {X^Xl - z)~^ . 

First, using the delocalization result (1.22) and the rigidity of eigenvalues (1.23), it is easy 
to have the following estimate on the matrix elements of the resolvent: 

maxmax max max I [G^(z)], , I + I [^^(z)], ,1 ^ X*^^ (5.15) 

7 kl r]^N~^~^ i^^c 

with C-high probability for any C > 0. For instance, for 7 = 0, we have the identity Go{z) = 
^a=i IT^f where Aq, Vq, are the eigenvalues and eigenvectors of Hq. By the delocalisation 
result (1.22) we obtain 



N ^ \\r 



We write the above sum as 
where Ik is the set that 

By the rigidity of eigenvalues we obtain that ^ C2^ with C-high probability. Substituting 
this bound in (5.16) yields the estimate (5.15). 

Recall that Xj denotes the z*^ column of X. For 1 ^ i ^ N, using straightforward algebra, 
it is easy to check that 



, (gx,.), (xtg), _ M^) {G^'^^M^^G^% 

- ^'^^ + 1 - (x„ g{z) X,) ' - 1 + (x„ gi^){z) X,) 



From (2.16) we obtain 



■1 



z Gii 



Xi,^(2;)xi) = 1 + zGi 



l + (xi,6;W(z)x,) 
Furthermore, from (2.17) it follows that 

(x„^(^^-)x,) (x„^(''^-)x,) 
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(5.18) 
(5.19) 



(x„^«x,) = (xi,6;(^^-)x,) 



i + (x„6;fe-)x,) 



l + (x,,^fo-)x,) - '"^ni^^^y ^^)- Gu 



Similarly 



which implies that 



G 

(xj, Xj) = — ^, (xj, gxj) = -zGij . 



(5.20) 



(5.21) 



Let Xi be the z*^ row of X. By symmetry, the above identities also hold if one switches {G, Xj) 
and {g,Xi). 

Combining the above identities with (5.15), we obtain the bound 



max max max max 

7 kl ri^N-'^-^ \k\^c 



\[G,{z)U + \[X,G,{z)],,\ + [G.Xliz)]^^ + [X,G,Xl{z)] 



kl 



(5.22) 



with C-high probability. 

Consider the telescopic sum of differences of expectations 



EF(— Tr ^ 



- z 



1 1 
-EF ( — Tr. 



MN 

E 

7=1 >- 



N -z 



, 1 1 
E F — Tr ■ 



(5.23) 



N H^_i - z 



Let F^*-?) denote the matrix whose matrix elements are zero everywhere except at the («,j) 
position, where it is 1, i.e., F^*/'* = 6ik5ji. Fix an 7 ^ 1 and let (i, j) be determined by 
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0(^5 j) = 7- We will compare H^_i with H^. Note that these two matrices differ only in the 
(i, j) matrix element and they can be written as 



= Q + 



V 



X^ = Q + W, 



W := xj". E^'^^ 



with a matrix Q that has zero matrix element at the position. Define the Green functions 

1 ^1 „ 1 



R 



Q^Q-z' 



S 



H^^i — z ' 



T 



— z 



The following lemma is at the heart of the Green function comparison first established in 
[12] (subsequently used in [13, 14, 7]) which states that the difference of smooth functionals 
of Green functions of two matrices which differ from a single entry can be bounded above as 
a function of its first four moments. 



Lemma 5.4. Let he the k^^ moment of y/MxJ,, then 



I]! 



E 



N 



TtR 



A{Q, mi, m2, m^) + ^ ^^g^ 



1714 



(5.24) 



for a functional A{Q, mi, 1712,1713) which only depends on the distribution ofQ andmi,m2,m3. 
The constant A{Q) depends only on the distribution of Q and satisfies the bound 



\A{Q)\<:x 



~2+Ce 



Before giving the proof of Lemma 5.4, let us use it to conclude the forgoing argument in 
the proof of Theorem 5.3. Note that the matrices and Q also differ by one entry, and 
therefore applying Lemma 5.4 yields 



E 



F(-TrT 



Fi-TrR 



A{Q, mi, m2, ma) + ^ ^(q)^^ (5.25) 



where m'^^ is the fourth moment of v Mx^ (by hypothesis, the first three moments of xjj are 
identical to those of xj,.) Since \m'^ — m^l ^ A^^^ by hypothesis, we have 



EF 



N 



Tr 



EF 



Tr 



N H, 



7-1 



Using the above estimate and summation over 7 yields (see (5.23)) 



EF 



N - z 



EF 



- z 
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obtaining precisely what we set out to show in (5.14). The proof can be easily generalized 
to functions of several variables. Thus to conclude the proof of of Theorem 5.3, we just need 
to give the proof of Lemma 5.4. 

Proof of Lemma 5.4. We first claim that the estimate (5.15) holds for the Green func- 
tion R as well. To see this, from the resolvent expansion we obtain, 

R = s+s{v^x+x^v+vW)s+. . .+[S{y^x+x^v+v^v)fs+[S{y^x+x^v+v^v)Y^R . 

Since the matrix V has only at most one non-zero entry, when computing the {k,i) matrix 
element of the matrix identity above, each term is a finite sum involving matrix elements of 
S", XS, SX\ XSX^ or R (only for the last term) and xjy Using the bound (5.22) for the 
S matrix elements, the subexponential decay for xj^ and the trivial bound \Rij\ ^ ri~^, we 
obtain that the estimate (5.15) holds for R. Similarly by expanding XR, RX and XRX, we 
can obtain (5.22) for XR, RX and XRX, QR, RQ and QRQ. 

Now we prove (5.24). By the resolvent expansion, 

S = R- RiV^Q + QV + V^V)R + ...- [RiV^Q + Q^V + V^V)fR + 0{N-^) (5.26) 

holds with extremely high probability. Thus we may write 

lTrS = lTr/2+5^y, + 0(iV-^), 
fcs:20 

where Uk is the sum of the terms in (5.26), in which there are exactly k V^s. Recall as 
the fc-th moment of y/Mxij, which is 0(1) ii k = 0(1). The terms yk satisfy the bound (with 
K = {ki, k2,--- , kn) and \K\ := J2i h) 

\yk\ ^ iV^^'iV-'=/^ ^.Vk^Vk, ■■■yk^= iV-l^l/^miKi zk{Q)^ \zk{Q)\ ^ N^"' (5.27) 

for some zk{Q) only depends on the distribution Q and the last inequality holds with with 
(^-high probability. Here Ev is the expectation value with respect to the distribution of the 
entries of the matrix X^ . Then we have 

EF(lTr^-^)=Ej:lF(")(lTri?) 5^,, + 0(iV-/2.c.) _ (5.28) 
From (5.27) we obtain 

EF(lTr^-^)=EX:V"^(^Tri?) f J] iVl-l/^ m,,, .,(g)) + 0(iV-/^-^^) 



56 



= B + 0(iV-^/2+c.) ^ ^^Q^ m^,m2, m-,) + A{Q)m^ 



where A{Q, 7711,7712,771^) only depends on the distribution of Q and mi,m2,m3 and 
^ = ^ E Tr i?) ( 5: iV-l-l/^ m,., .,(Q) ) , 

n=0 ■ \ki,...,k„:\K\^5,ki^20 J 



I(Q)=EX:;^f'»'(^TVfi) ( 

n=0 ' Y 



ki,...,kn:\K\=4: 



In the above = Yli ^i- Now it only remains to prove 

\B\ ^ 0(A^"^/2+C7e)^ 2(g) ^ 0(A^-2+^") . 

Using the estimate (5.22) for R and the derivative bounds (5.12) for the typical values of 

j^TyR, we see that F^""^ (^j^ Ti Rij {71 ^ 4) are bounded by A^^^ with C"high probability. 

Similarly zk {ki ^ 20) is also bounded by iV*-^^ for some C > with (^-high probability. Now 
we define as the good set where these quantities are bounded by N'-^^. Furthermore, using 

(5.13) and definition of zk, we know that F^") ^-^ Tr rJ and zk are bounded by A^*^ for some 

C > in Eg. Since has a very small probability by (5.22), we have 

^(Q) =EH,X^^F(")(^Tri?) [ Yl N-'zk{Q)] + 0(iV-^/2+^^) . 

n=0 \fei,...,fc„:|i^|=4 / 

Then with the bounds on F*-"-* and zk in 'Eg, we obtain A{Q) < 0{N-^+^'). Similarly with 
fT^\K\ ^ 0(1), we have B ^ 0{N~^^'^~^'-'^) completing the proof of Lemma 5.4 and thereby 
also finishing the proof of Theorem 5.3. □ 

6. Universality of eigenvalues at Edge. In this section we give the proof of edge uni- 
versality stated in Theorem 1.10. The proof is loosely based on Theorem 2.4 of [14] which 
is an analogous result for Wigner matrices but here we introduce new ideas to circumvent 
some key difficulties. In the following we consider the largest eigenvalue Ai, but the same 
argument applies to the lowest non-zero eigenvalue as well. Also for the rest of this section, 
let us fix a constant ( > 0. 

For any Ei ^ E2 let 

M{Ei,E2) := #{Ei ^ X, ^ E2} 
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denote the number of eigenvalues of the covariance matrix X'^X in [Ei,Eq\ where X is a 
random matrix whose entries satisfy (1.1) and (1.2). By Theorems 1.3 and 1.5 (rigidity of 
eigenvalues), there exists a positive constant such that 

|Ai-A+| (Q_i) 

holds with C-high probability. Using these estimates, we can assume that the parameter s in 
(1.30) satisfies 

^ s ^ . (6.3) 

Set 

:= A+ + 2(p^^iV-2/3 (6.4) 

and for any E ^ Ei^ define 

to be the characteristic function of the interval [E, i?^]. For any > we define 

e„{x) := , = (6.5) 

7r(x"^ + ?7"^j vr x — trj 

to be an approximate delta function on scale rj. In the following elementary lemma we 
compare the sharp counting function N'{E, E() = Ttxe{H) by its approximation smoothed 
on scale rj. Notice that for any i > 0, 



1 /"^^ 

Tr xe-£ * Or){H) = N- / 3 m{y + iT])dy 

Je-i 



Let us fix e > and set 

r^, = iV-2/3-9- . (6.6) 
Lemma 6.1. For any e > 0, set £i := A^^2/3-3e^ Then for any E satisfying 

\E-K\ ^ ^<^^^iV-2/3 (6.7) 

where the constant is as in (6.1)-(6.4), the bound 

I Tr xe{H) - Ttxe * drnWl ^ C {N-^'+M{E -e,,E + £,)) (6.8) 
holds with (-high probability. 
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Proof of Lemma 6.1. From Equations (6.1), (6.2) above, and (6.13) and (first line of) 
(6.17) of [14] we obtain 

I Tr xe{H) - Tr * 9^, {H)\ ^ C {U{E -£,^E + ii) + N-'') (6.9) 

+CNr]^{E^-E) [ -J^Qm{E-y + zi,)dy. 

Using the rigidity of eigenvalues (1.23), one can prove that 

with C-high probability. On the interval ^ E — y ^ we use (1.19), i.e., 

c 

Qrn{E-y + it^) ^ Qm,{E -y + + ^ 



and the elementary estimate ^ rriciE —y ^Cy£i + \E — y — X^\. Using the definitions 
of ii and ?7i it can be shown that (see Equation (6.18) of [14]) 

Nr]i{E^-E) [ -^^^m{E-y + ti,)dy^N-^' . 
Now the Lemma follows from (6.9). □ 

Let g : M — )■ M+ be a smooth cutoff function such that 

q{x) = l if |x| ^ 1/9, q{x) = if |x| ^ 2/9, 

and we assume that q{x) is decreasing for x ^ 0. Then we have the following corollary for 
Lemma 6.1 (which is the counterpart of Corollary 6.2 in [14]): 

Corollary 6.2. Let ii be as in Lemma 6.1 and set i := |£iA^^^ = \N~'^I'^~'^ . Then for 
all E such that 

|E- A+l ^ (6.10) 

where the constant is as in (6.1)-(6.4), the following inequality 

TiXE+i * 0^,{H) - ^ U{E, oo) ^ TiXE^i * e^,{H) + N^' (6.11) 

holds with (-high probability. Furthermore, there exists Nq E N independent of E such that 
for all No, 

E q (TrxE-i * e^,{H)) ^ ¥{M{E, oo) = 0) ^ E g (TiXE+i * e„,{H)) + Ce-^""' . (6.12) 
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Proof. For any E satisfying (6.10) we have Ec_ - E £ thus \E - \+ ~ £|A^2/^ ^ |v?'^« 
(see (6.7)), therefore (6.8) holds for E replaced with y G [E — l^E] as well. We thus obtain 

TiXE{H)^r^ [ dy TiXyiH) 

Je-£ 

r dy TTXy*0,,{H) + Cr' r dy[N-''+^^{y-i^,y + i,)] 
JE-e J E-e 

^ Tr XE-i * 0,, (H) + CN-^' + C^-jM{E ~2i,E + i) 

holds with C-high probability. From (1.24), (6.10), h/i = 2N~'^^ and £ ^ iV"^/^ we gather 
that 

n .E+i . 

-}M{E -2i,E + i)^ N^~^' / g,{x)dx + iV"2^(log N)^^ ^ -N~' 

holds with C"high probability, where we estimated the explicit integral using the fact the 
integration domain is in a CA^~^/^(y9'"'^-vicinity of the edge at A+. We have thus proved 

N{E, E^) = Tr xe{H) ^ Tr xe-, * 9,, (H) + N'^. 

Using (6.1) we can replace Af{E,E(^) by J\f{E, oo) with a change of probability of at most 
0(e^^ ^) . This proves the upper bound of (6.11) and the lower bound can be proved 
similarly. 

When the event (6.11) holds, the condition ^/{E, oo) = implies that Trx£;_|_^ * 6,^^{H) ^ 
1/9. Thus we have 



¥{^{E,oo) =0) ^F{TTXE+e*Or„{H) ^ 1/9) + Ce"'^ \ (6.13) 

Together with the Markov inequality, this proves the upper bound in (6.12). For the lower 
bound, we use 

Eq{TrxE-i*OrniH)) ^¥{Tt XE-i*0r,AH) ^ 2/9) ^¥{Af{E, oo) 2/9+N-') = F{j\f{E, oo) 

where we used the upper bound from (6.11) and that M{E, oo) is an integer. This completes 
the proof of the Corollary 6.2. □ 

6.1. Green Function Comparison Theorem. Recall the matrices = [xJj],X^ = [xjj], 
= {X^)''X^, = {X'^)^X^ and their respective Green functions G^, G^ from Section 
5. Define m^(z) = ^TrG'^(^) and m^(z) = iTrG'^(z). 

Also notice from (6.5) that 6^[H) = -'^m{iri). Corollary 6.2 bounds the probability of 
J^{E, oo) = in terms of the expectations of two functionals of Green functions. In this 
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subsection, we show that the difference between the expectations of these functionals w.r.t. 
two ensembles and is neghgible assuming their second moments match. The precise 
statement is the following Green function comparison theorem on the edges. All statements 
are formulated for the upper spectral edge A+, but identical arguments hold for the lower 
spectral edge A_ as well. 



Theorem 6.3 (Green function comparison theorem on the edge). Let F : M — )■ 

function whose derivatives satisfy 



max|F(")(x)| (|x| + 1)"^^ ^ Ci, 



a 



1, 2, 3, 4 



he a 



(6.14) 



with some constant Ci > 0. Then there exists > Nq E depending only on Ci such 
that for any e < Eq and N ^ No and for any real numbers E, Ei and E2 satisfying 



|E- A+l ^ iV-2/3+", 
and 7] = N^'^/^^^ , we have 



l^i-A+l ^ A^-2/3+^, 



\Eo 



A+Kiv-2/3+^ 



E^F [Nt] 3 nf{z)) - E^F (iVr/ m^(z)) 



and 



/ rE2 \ / /•-B2 

WF\^ j dy ^ {y + ir]) j - F i^N J dy ^ {y + i?]) 



= E + ir], (6.15) 
(6.16) 



Theorem 6.3 holds in much greater generality. We state the following extension which can 
be used to prove (1.31), the generalization of Theorem 1.10. The class of functions F in the 
following theorem can be enlarged to allow some polynomially increasing functions similar 
to (6.14). But for our application of the above theorem to prove (1.31), the following form 
is sufficient. 

Theorem 6.4. Suppose that the assumptions of Theorem 1.10 hold. Fix any k E N+ and 
let F : M.^ ^ M. be a bounded smooth function with bounded derivatives. Then there exists 
Eq > 0, Nq E N depending only on Ci such that for any E < Eq and N ^ Nq, there exists 6 > 
such that for any sequence of real numbers E^ < . . . < Ei < Eq with \Ej — A_|_| ^ jV"^/^"*"^, 
J = 0, 1, . . . , /c, and rj = N~'^/^~^ we have 



E'^fI^NJ dyQm{y + iri),...,N j dy ^ m{y + ir]) j - E"^ F {m'' ^ m 
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(6.17) 



where in the second term the arguments of F are changed from to to rn^ and all other 
parameters remain unchanged. 

Proof. The proof of Theorem 6.4 is similar to that of Theorem 6.3 and will be omitted. 

□ 

Before proceeding further, let us state the following theorem which gives a sufficient criteria 
for proving edge universality for matrix ensembles of the form Y'^Y for various types of 
data matrices Y. Let YmxN = [Vij], ZmxN = [%] be two matrix ensembles, and set H"^ = 
Y'^Y, = Z'^Z . Define the corresponding Green functions = {H^ — z)^^ , = {H^ — 
z)~^ and denote their respective empirical Stieltjes transforms by m/ ,m^. 

Theorem 6.5. Assume that the matrices Y, Z satisfy the conclusions stated in items 
(z), (a) and (Hi) of Theorem 1.3. Furthermore, assume that and satisfy the conclu- 
sions of Theorems 6.3 and 6.4. Then the asymptotic eigenvalue distribution of the matrices 
H^,H^ at the edge are identical, i.e., the conclusions of Theorem 1.10 are satisfied with 
= Y and = Z. 

Remark 6.6. Thus our results can he used to show edge universality for cases far be- 
yond covariance matrices. In [27] we use the Theorem 6.5 to prove the edge universality of 
correlation matrices. 

Proof of Theorem 6.5. Indeed, from the calculations done in the previous two subsec- 
tions, it is clear that for the arguments used in our application of Green function comparison 
method to go through all we need are the strong MP law and the rigidity of eigenvalues 
(items (i), (ii) and (iii) of Theorem 1.3) and Theorem 6.3 and 6.4. □ 

Now we first prove Theorem 1.10 asssuming that Theorem 6.3 holds and then give the 
proof of Theorem 6.3. 

Proof of Theorem 1.10. Define Ec_ as in (6.4) with a constant such that (6.1) 
and (6.2) hold. Therefore we can assume that (6.3) holds for the parameter s. Let E := 
A+ + sN""^/^ so that \E — A+| ^ (p^'^N^^^^. Using (6.12), for any sufficiently small e > 0, we 
have 

q {TiXE-i * 0,,{H)) ^ P-(Ar(E, 00) = 0) 
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with 

I := r/i := iy-^/^-s^ 

Recall that by definition, 

1 /"^f 

TrxE-e* 0^-,{H) = N- m{y + %r]i)dy . 

The bound (6.16) applied to the case Ei = E — i and E2 = E(^ shows that there exists 5 > 0, 
such that 

¥7 q (Tr XE-i * ^r,, (H)) ^ q (Tr Xi^-^ * 6^, (H)) + N'' . (6.18) 

Then applying the right side of (6.12) in Lemma 6.2 to the l.h.s of (6.18), we have 

W{M{E -2(.,oo) = 0) ^ g(TrXi?-^*^„i(i^)) + Cexp [-c(^^W]. 

Combining these inequalities, we have 

¥^{M{E - 2£, cx)) = 0) ^ ¥^{M{E, cx)) = 0) + 2iV"^ (6.19) 

for sufficiently small e > and sufficiently large A^. Recalling that E = \+ + sN""^^^, this 
proves the first inequality of (1.30) and, by switching the roles of v, w, the second inequality 
of (1.30) as well. This completes the proof of Theorem 1.10. □ 

Proof of Theorem 6.3. The proof is similar to that of Lemma 5.3. We need to com- 
pare the matrices and H"^. Instead of replacing the matrix elements one by one [NM 
times) and comparing their successive differences, here we estimate the successive difference 
of matrices which differ by a column. Indeed for 1 ^ 7 ^ A^, denote by the random 
matrix whose j-th column is the same as that of X^ if j < 7 and that of X^ otherwise; in 
particular Xq = X^ and X^ = X"^. As before, we define 

= X^X^ . 

We will compare H^_i with using the following lemma. For simplicity, we denote 

m«(2) = m«(^) - (A^^)"^ 

Lemma 6.7. For any random matrix X whose entries satisfy (1.1) and (1.2), if \E — 

A+l ^ A^-2/3+£ ^j^^ jY-2/3 ^ rj ^ Jqj. gome e > 0, then we have 

EF{Nr]^m{z)) -EE {N7]^m'-'\z)) = A(A:«, mi, ma) + A^"^/^+^' (6.20) 

where the functional A(X^'^\mi,m2) only depends on the distribution of X^^^ and the first 
two moments mi, ma of y/Mxji = \/M{X)ji, (1 ^ j ^ M). 



Notice that X^^ is equal to X)^'\. We also have that first two moments of the entries of 



'(7) 
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and X^ are identical. Thus Lemma 6.7 implies that 



EF Tr 



1 



H^^i — z 



EF Tr 



1 



0{N- 



-7/6+Ce 



)• 



(6.21) 



As done in Theorem 5.3, the proof of Theorem 6.3 now can be completed via the telescoping 
argument. Thus to finish the proof of Theorem 6.3, all that needs to be shown is Lemma 6.7 
which is proven below. □ 

Proof of Lemma 6.7. Fix > 0, e > and without loss of generality, assume that 
i = 1. Recall that iV-2/3 > ^ and \E - A+| ^ Ar-2/3+e_ pj^st, we claim the 

following bounds for G'-^^ and Q^^\ 



{^i{g^^\z)f^i) I ^ iVi/3+^^ z = E + ir] 



(6.22) 



[6;«(^)],,| ^iv^^ {[g^'\z)]')^^ ^ivl/3+^^ z = E + tr^ 



(6.23) 



with C-high probability for some C > 0. In the above, we allow i = j. We postpone the proof 
of these bounds to the end. 

Now using (2.16) and (2.18), we have 

Tr G - Tr G« + z'^ = {Gn + z'') + ^ 



-z — z 



;xi,^w(^)xi) 



= ^Gn(xi(^W)2(^)xi) , (6.24) 
where xi denotes the first column of the matrix X. Define the quantity B to be 

-1 



B = —z rrir 



(xi,^«(z)xi) 



zmc{z) 



By (2.16), 



B = —Z TTlr 



z Giiiz 



zmc{z) 



rric - Gil 



G 



11 



From (1.20) , we obtain that 



(6.25) 



151 ^ iV~l/3+2e < 1 



(6.26) 
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with C-high probability. Therefore, we have the identity 

Gn = -^ = m.5^(-5)^ (6.27) 

fc>0 

Define y with the Lh.s of (6.24), 

y:=r]{TTG -TiG^^^ + z-^) (6.28) 

so that we have 

Nr]Qm{z) = Nr]'^m^^\z) +y . (6.29) 
Using (6.24) and (6.27) we obtain 

oo 

y = r/zGn(xi(^(^))^Xi) =5^y,, := r/zm,(-i?)'=-^ (xi(^W)^Xi) . 

k=l 

Since z and are 0(1), together with (6.22) and (6.26) we see that the bounds 
\yk\ ^ 0(Ar-'=/3+c^^) ^ 
hold with C-high probability. Consequently, using (6.29), the expansion 

F{N7]^m{z))- F {N7]^m^^\z)) = (6.31) 

(iVr/53m«(2)) (3 y)'= + 0(iV-^/3+^^) 

fc=i ^• 

holds with (^-high probability. 

Now we estimate each of the three terms (k=l,2,3) in the r.h.s of (6.31) individually. First 
using (6.30) we obtain that 

F^^^ {N7]^m^^\z)) {'^yf = F^^^ [Nt]"^ m^^\z)) (^> yi)^ + 0(iV-'/3+^") (6.32) 

holds with C-high probability. Moreover, we have 

Ei(32/i)3 = Ei(r/zm,)3 {^.{gWy^,)' 

M 6 3 

= {vzm^f J2 Ei(nx.a)n[(^^'^)']'^--.^- (6.33) 

fci,...,fc6=l i=l i=l 
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where Ei is the expectation value with respect to x^^, the first column of X. Recall that 
denotes the fc-th moment of \fMxji. If there is an index ki which is different from all the 
others in the product ni=i ^fca then 

6 

IEi(]^a;fc^i) = = mi , 

i=l 

and if each ki appears exactly twice, then 

6 
i=l 

Isolating the above two cases from the sum (6.33), we have 

6 

A i=l 

where A denotes set of indices ki G {1,2, ■ ■ ■ M} such that (1) no ki appears exactly once 
in the product n^=i ^kii and (2) there is an index ki which appears at least three times. 
Clearly, the functional A'i{X^^\mi^m2) depends only on X'^^\ mi and m2- Furthermore, it 
readily follows that 

#^ ^ CN^ . 

Then using (6.23) and the bounds on m^'s, it follows that 

^.liSsyif = l3(X«,mi,m2) + 0(Ar-2+^") _ ^2 34^ 

It is easy to prove that \Nr]'^ m^^^l ^ A^*-^^ with C-high probability. Using (6.32) and the fact 
that m^^^ only depends on X^^\ we have 

EF(3) (iVry^>m«(2)) (Cjy)3 = A3(X«,mi,m2) + 0{N^^/^+^') , (6.35) 

where As{X^^\ mi, 1712) depends only on the distribution of X^^\ mi and m2- 
Now we estimate the term with F^'^^ in (6.31). As in (6.32), we have 

(Arr/53m«(2)) {^y)^ = F^^^ {Nr]'^m^^\z)) [(^St/i)' + 2(!3yi)(!3i/2)] + 0(Ar-^/3+^^) _ 

(6.36) 
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By definition, 



where Ci{z), C2{z) = 0(1) are constants which depend only on z and mc{z). Using the 
bounds on Q^^^ in (6.23), as in (6.34), we have 



El 



{Qyif + {'^yi){Qy2)] = A2(X«, mi, ms) + 0{N' 



■5/3+C£\ 



where A2{X^^\mi,m2) depends only on the distribution of X^^\ mi and m2. Then with 
(6.36), as in (6.35), we conclude that 

EF(2) (iVr/!3m«(^)) C^vf = A2(X«, mi, ma) + 0(iV"'/3+^") , (6.37) 

for some functional A2 which only depends on the distribution of X^^\ mi and m2. 
Finally we estimate the term F^^^ in (6.31). As in (6.32), we have 

F(^) {Nr]'^m^^\z)) C^y^ = F^^^ [Nr]'^ m^^\z)) ['^yi + '^y2 + '^ys] + 0{N-^/^+^') . 

(6.38) 

A similar argument as in (6.37) and (6.35) yields 

EFW {Nr]'^m^^\z)) C^y) = Ai(xW, mi, ms) + 0(A^-^/^+^") . (6.39) 

Inserting (6.39), (6.37) and (6.35) into (6.31), we obtain (6.20). Now to complete the proof 
of Lemma 6.7 we need to prove (6.22) and (6.23). 

For (6.22), using the large deviation lemma (Lemma 1.6), we obtain that for any C > 0, 

1/2 

(6.40) 



(xi(^«)V) I ^ ^^c(iV-iTr |^«r)V^ ^ (j-J2;7^^] 

\ a Z\ J 

(\ 1/2 
-^y^ 1 =^^^ (^Qm^'\z)] 



1/2 



with (^-high probability. Then with (1.19) and (2.36), we have (6.22). For (6.23), we note 

rM) = I 



X(i)(X(i))t-z ■ 

Comparing with (1.4), we see that the pair {Q^^\(X^^^y) plays the role of {G,X). Since 



y^(X(i))t is just an (N - 1) 



X M random data matrix, whose entries have variance 
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[N — the results in (1.20) also hold for Q^^^ with slight changes. One can easily obtain 
that 

with C-high probability showing (6.23) and finishing the proof of Lemma 6.7 and consequently 
we have proved Theorem 6.3. □ 
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